HB 

Mi 


UC-NRLf 

HI 

5B    57H    061 


THE  PERSONAL  DISTRIBUTION  OF 

INCOME  IN  THE  UNITED  STATES 


FREDERICK  ROBERTSON  MACAULAY 
NATIONAL  BUREAU  OF  ECONOMIC  RESEARCH.  INC. 


*ACr, 


THE  PERSONAL  DISTRIBUTION  OF 

INCOME  IN  THE  UNITED  STATES 


BY 

FREDERICK  ROBERTSON  MACAULAY 
NATIONAL  BUREAU  OF  ECONOMIC  RESEARCH,  INC. 


Submitted  in  Partial  Fulfilment  of  the  Requirements  for  the  Degree 

of  Doctor  of  Philosophy  in  the  Faculty  of  Political 

Science,  Columbia  University 


36 


NEW  YORK 

HARCOURT,  BRACE  AND  COMPANY 
1922 


/¥/3bci 


COPYBIGHT,  1922,  BT 
NATIONAL  BUREAU  OP  ECONOMIC  RESEARCH,  INC. 


KXCHANG* 


Printed  in  the  U.  £.  A. 


fcs 


PREFACE 

In  the  year  1922  the  National  Bureau  of  Economic  Research,  Inc.,  pub- 
lished in  two  volumes  the  result  of  an  investigation  into  "Income  in  the 
United  States."  Part  III  of  Volume  II  of  that  work  consisted  of  the  present 
study.  The  author  acknowledges  with  thanks  the  courtesy  of  the  National 
Bureau  of  Economic  Research  in  permitting  him  to  have  this  reprint  made 
from  the  original  plates.1 

1  This  fact  explains  the  pagination. 
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CHAPTER  27 
THE  PROBLEM 

What  is  the  frequency  distribution  of  annual  income  among  personal 
income  recipients  in  the  United  States?  Before  we  can  give  an  intelligent 
answer  to  this  question,  we  must  formulate  it  more  definitely  by  indicating 
certain  connotations  which  logic  or  expediency  leads  us  to  attach  to  some 
of  its  terms. 

By  income  it  seems  desirable  to  mean  actual  money  income,  plus  the 
estimated  money  value  of  the  more  important  of  those  items  of  commodity 
or  service  income  on  which  a  money  value  is  ordinarily  placed.  Two  of 
the  most  important  items  which  are  thus  included  are  the  annual  rental 
values  of  owned  homes  and  the  value  of  farm  produce  consumed  by  farmers' 
families. 

In  line  with  the  ordinary  convention,  we  have  excluded  from  our  defini- 
tion of  income,  that  income,  whether  monetary  or  non-monetary,  which  a 
wife  receives  from  her  husband  or  a  child  from  its  parents.1  Not  only  is 
such  exclusion  practically  expedient  but  it  is  also  theoretically  defensible 
and  that  quite  apart  from  the  fact  that  a  money  value  is  not  ordinarily 
placed  on  the  services  of  wife  or  child,  wages  of  housekeepers  to  the  con- 
trary notwithstanding. 

The  frequency  distribution  resulting  from  the  exclusion  of  such  quasi 
incomes  will  be  less  heterogeneous  and  more  significant  and  interpretable 
than  the  distribution  which  would  result  from  inclusion.  For  the  relation 
of  the  incomes  of  wives  and  children  to  the  economic  struggle  is  derived 
and  secondary,  while  that  of  most  other  incomes  is  direct  and  primary. 
Now,  though  the  distribution  of  income  among  persons  is  not  synonymous 
with  distribution  among  the  factors  of  production,  the  two  problems  are 
very  closely  related.  An  individual's  income  may  be  thought  of  as  made 
up  of  wages,  rent,  interest  and  dividends,  profits,  and  gifts  or  allowances. 
If  we  omit  this  last  type  of  income,  the  problem  of  factorial  distribution 
proposes  an  investigation  of  how  and  why  the  individual  received  what 
remains.  Even  if  gifts  and  allowances  admitted  of  any  such  systematic 
and  reasoned  explanation  as  may  be  given  of  rent,  wages,  etc.,  the  ex- 
planation would  be  of  a  totally  different  kind.  Hence,  for  the  purposes  of 
this  investigation,  it  seems  undesirable  to  classify  as  income,  the  receipts, 

1  That  is,  while  such  income  has,  of  course,  been  counted  in  the  first  instance  as  income 
of  the  husband  or  parent  it  has  not  been  re-counted  as  income  of  the  wife  or  child. 
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whether  monetary  or  iKHi'-m'oiietary,  of  those  persons  receiving  merely 
allowances  or  gifts.1 

Similar  considerations  have  led  us  to  think  of  an  income  recipient  as  an 
individual  rather  than  a  family.  Just  as  it  is  the  husband  and  not  the 
wife,  the  parent  and  not  the  child,  so  it  is  the  individual  and  not  the  family 
who,  as  an  income  receiver,  comes  into  direct  economic  relationship  with 
the  machinery  of  distribution. 

The  chief  argument  in  favor  of  family  rather  than  individual  treatment 
of  the  frequency  distribution  is  based  upon  the  idea  that,  though  income 
accrues  to  the  individual  and  not  the  family,  the  family  is  a  more  significant 
unit  of  economic  need  than  the  individual.  But  this  is  a  different  approach 
to  the  question  and  has,  of  course,  no  intimate  relation  to  the  problem  of 
factorial  distribution.  Moreover,  we  must  remember  that  if  we  are  going 
to  improve  appreciably  upon  the  individual,  even  as  a  need  unit,  we  can- 
not stop  with  actual  biological  families  with  their  great  variation  in  size 
and  constitution,  but  must  introduce  the  concept  of  the  theoretical  family — 
father,  mother  and  three  children,  for  example.  This  last  concept  is,  in 
its  raw  form,  quite  unusable.  The  population  is  not  made  up  of  such 
theoretical  families.  We  may  discuss  what  a  family  of  five  ought  to  get 
to  maintain  a  decent  standard  of  living,  but  we  cannot  divide  the  actual 
population  into  families  of  five  and  discuss  what  these  non-existent  hy- 
pothetical families  actually  do  get.  There  remains  the  alternative  of  ex- 
pressing actual  families  in  terms  of  some  need  unit  such  as  the  "ammain."  2 
While  this  last  procedure  would  probably  yield  an  extremely  interesting 
distribution  based  upon  need  units,  it  is  impractical  to  attempt  any  such 
solution  with  the  data  available.3 

Though  a  distribution  of  income  among  actual  biological  families  would 
appear  to  be  somewhat  less  enlightening  and  interpretable  than  a  dis- 
tribution by  individuals  or  by  ammains,  it  would  have  its  own  peculiar 
interest  and  we  would  have  attempted  its  construction  had  the  data  been 
adequate  for  such  a  purpose.  Most  of  the  data  bearing  on  income  dis- 
tribution are  in  the  individual  form;  wages  distributions,  for  example,  are 

1  Of  course  if  the  wife  or  child  has  "independent"  income,  that  income  is  no  longer  of  the 
nature  of  a  gift  or  allowance  even  though  it  may  arise  from  property  originally  deeded  by 
the  husband  or  father.     It  is  now  explainable  in  terms  of  rent,  interest,  etc. 

If  income  be  defined  as  above,  the  term  personal  income  recipient  will  correspond  closely 
to  the  census  expression  person  gainfully  employed.  Perhaps  the  most  important  difference 
is  that  we  do  not  and  the  Census  does  include  as  separate  income  recipients,  farm  laborers 
working  on  the  home  farm. 

*  Ammain  is  a  word  coined  by  W.  I.  King  and  E.  Sydenstricker  and  defined  by  them,  for 
any  given  class  of  people,  as  "a  gross  demand  for  articles  of  consumption  having  a  total 
money  value  equal  to  that  demanded  by  the  average  male  in  that  class  at  the  age  when  his 
total  requirements  for  expense  of  maintenance  reach  a  maximum."  Measurement  of  RtHatit* 
Economic  Status  of  Families.  Quarterly  Publications  of  the  American  Statistical  Association, 
Sept.,  1921,  p.  852. 

'  It  is  of  course  quite  possible  to  estimate  the  average  per  ammain  income,  as  has  been  done 
by  Mr.  King;  the  total  income  of  the  people  can  be  divided  by  the  estimated  number  of 
ammains  in  the  population.     See  pages  233  and  234. 
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almost  without  exception  in  that  form.  Now  to  estimate  the  frequency- 
distribution  of  income  among  families  from  data  which,  in  the  first  place, 
are  in  the  individual  form  and,  in  the  second  place,  are  extremely  inade- 
quate for  estimating  even  the  distribution  among  individuals,  could  only 
increase  the  degree  of  uncertainty  in  our  results. 

A  few  words  explaining  the  reason  for  introducing  the  next  chapter  at 
this  point  are  not  out  of  place  here.  The  data  upon  which  an  estimate  of 
even  the  individual  distribution  of  income  in  the  United  States  must  be 
based  impress  one  as  being  in  such  shape  that  it  is  impossible  to  arrive  at 
more  than  the  roughest  sort  of  approximation  by  any  mere  direct  adding 
process.  Some  more  ingenious  plan  would  seem  almost  necessary.  For 
example,  would  it  not  be  possible  to  formulate  a  general  mathematical 
"law"  for  the  distribution  of  incomes  which  law  might  then  be  used  for 
"adjusting"  the  tentative  and  hypothetical  results  obtained  from  piecing 
together  the  existing  scanty  and  inadequate  material? 

The  possibility  and  desirability  of  mathematically  describing  the  fre- 
quency distribution  of  income  would  seem  intimately  tied  up  with  the  case 
for  mathematically  describing  error  distributions  and  statistical  distribu- 
tions in  general.  The  fact  that,  in  our  problem,  the  "law"  would  be  largely 
derived  from  the  same  data  as  those  which  were  to  be  "adjusted"  need 
not  greatly  disturb  us.  The  procedure  of  adjusting  observations  in  the 
light  of  a  mathematical  expression  derived  from  the  same  observations  is 
not  novel.  A  number  of  attempts,  one  of  which  has  become  world-famous, 
have  been  made  to  demonstrate  that  the  distribution  of  income  follows 
a  definite  mathematical  law.  However,  the  next  chapter  will  show  why 
we  fear  that  no  rational  and  useful  mathematical  law  will  soon  be  formu- 
lated. 


CHAPTER  28 

PARETO'S  LAW  AND  THE  GENERAL  PROBLEM  OF  MATHE- 
MATICALLY DESCRIBING  THE  FREQUENCY  DISTRIBU- 
TION OF  INCOME 

The  problem  of  formulating  a  mathematical  expression  which  shall  de- 
scribe the  frequency  distribution  of  income  in  all  places  and  at  all  times, 
not  only  closely,  but  also  elegantly,  and  if  possible  rationally  as  opposed 
to  empirically,  has  had  great  attractions  for  the  mathematical  economist 
and  statistician.  The  most  famous  of  all  attempts  at  the  solution  of  this 
fascinating  problem  are  those  which  have  been  made  by  Vilfredo  Pareto. 
Professor  Pareto  has  been  intensely  interested  in  this  subject  for  many 
years  and  the  discussion  of  it  runs  through  nearly  all  of  his  published 
work.  The  almost  inevitable  result  is  that  "Pareto's  Law"  appears  in  a 
number  of  slightly  different  forms  and  Professor  Pareto's  feelings  con- 
cerning the  "law"  run  all  the  way  from  treating  it  as  inevitable  and  im- 
mutable to  speaking  of  it  as  "merely  empirical." 

In  its  best  known,  most  famous,  and  most  dogmatic  form,  Pareto's  Law 
runs  about  as  follows: 

1.  In  all  countries  and  at  all  times  the  distribution  of  income  is  such 
that  the  upper  (income-tax)  ranges  of  the  income  frequency  distribution 
curve  may  be  described  as  follows:  If  the  logarithms  of  income  sizes  be 
charted  on  a  horizontal  scale  and  the  logarithms  of  the  numbers  of  persons 
having  an  income  of  a  particular  size  or  over  be  charted  on  a  vertical 
scale,  then  the  resulting  observational  points  will  lie  approximately  along 
a  straight  line.    In  other  words,  if 

x  =  income  size  and 

y  =  number  of  persons  having  that  income  or  larger 
then  log  y  =  log  b  +  m  log  x 
or  y  =  bxm.1 

2.  In  all  countries  and  at  all  recent  times  the  slope  of  this  straight  line 
fitted  to  the  cumulative  distribution,  that  is,  the  constant  m  in  the  equa- 
tion y  =  bxm,  will  be  approximately  1.5.2 

3.  The  rigidity  and  universality  of  the  two  preceding  conclusions  strongly 

1  If  the  cumulative  distribution  (cumulating  from  the  higher  towards  the  lower  incomes 
as  Pareto  does)  on  a  double  log  scale  could  be  exactly  de3cribed  by  the  equation  y  =  bxm, 
the  non-cumulative  distribution  could  be  described  by  the  equation  Y  =  —  mbxm  ~  1. 

*  Strictly,  minus  1.5,  though  Pareto  neglects  the  sign. 
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suggest  that  the  shape  of  the  income  frequency  distribution  curve  on  a 
double  log  scale  is,  for  all  countries  and  at  all  times,  inevitably  the  same 
not  only  in  the  upper  (income-tax)  range  but  throughout  its  entire  length. 

4.  If  then  the  nature  of  the  whole  income  frequency  distribution  is 
unchanging  and  unchangeable  there  is,  of  course,  no  possibility  of  economic 
welfare  being  increased  through  any  change  in  the  proportion  of  the  total 
income  going  to  the  relatively  poor.  Economic  welfare  can  be  increased 
only  through  increased  production.  In  other  words,  Pareto's  Law  in  this 
extreme  form  constitutes  a  modern  substitute  for  the  Wages  Fund  Doc- 
trine. 

This  is  the  most  dogmatic  form  in  which  the  "law"  appears.  In  his 
later  work  Professor  Pareto  drew  further  and  further  away  from  the  con- 
fidence of  his  first  position.  He  had  early  stated  that  the  straight  line  did 
not  seem  adequate  to  describe  distributions  from  all  times  and  places  and 
had  proposed  more  complicated  equations.1  He  has  held  more  strongly 
to  the  significance  of  the  similarity  of  slopes  but  he  has  wavered  in  his 
faith  that  the  lower  income  portions  of  the  curve  (below  the  income-tax 
minimum)  were  necessarily  similar  for  all  countries  and  all  times.  He  has 
given  up  the  suggestion  that  existing  distributions  are  inevitable  though 
still  speaking  of  the  law  as  true  within  certain  definite  ranges.  To  translate 
from  his  Manuel  (p.  391):  "Some  persons  would  deduce  from  it  a  general 
law  as  to  the  only  way  in  which  the  inequality  of  incomes  can  be  dimin- 
ished. But  such  a  conclusion  far  transcends  anything  that  can  be  derived 
from  the  premises.  Empirical  laws,  like  those  with  which  we  are  here 
concerned,  have  little  or  no  value  outside  the  limits  for  which  they  were 
found  experimentally  to  be  true."  Indeed  Professor  Pareto  has  himself 
drawn  attention  to  so  many  difficulties  inherent  in  the  crude  dogmatic 
form  of  the  law  that  this  chapter  must  not  be  taken  as  primarily  a  criticism 
of  his  work  but  rather  as  a  note  on  the  general  problem  of  mathematically 
describing  the  frequency  distribution  of  incomes. 

Almost  as  soon  as  he  had  formulated  his  law  Professor  Pareto  recognized 
the  impossibility  of  extrapolating  the  straight  line  formula  into  the  lower 
income  ranges  (outside  of  the  income-tax  data  which  he  had  been  using). 
The  straight  line  formula  involves  the  absurdity  of  an  infinite  number  of 
individuals  having  approximately  zero  incomes.  Professor  Pareto  felt 
that  this  zero  mode  with  an  infinite  ordinate  was  absurd.  He  believed 
that  the  curve  must  have  a  definite  mode  at  an  income  size  well  above 
zero  2  and  with  a  finite  number  of  income  recipients  in  the  modal  group. 

1  The  inadequacy  of  these  more  complicated  equations  is  discussed  later.  See  pp.  348,  363 
and  364. 

2  This  is,  of  course,  not  absolutely  necessary.  It  depends  upon  our  definitions  of  income 
and  income  recipient.  If  we  include  the  negligible  money  receipts  of  young  children  living 
at  home  we  might  possibly  have  a  mode  close  to  zero.  There  are  few  children  who  do  not 
really  earn  a  few  pennies  each  year.     Compare  Chart  31A  page  416. 
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Having  come  to  the  conclusion  that  the  income  frequency  distribution 
curve  must  inevitably  have  a  definite  mode  well  above  zero  income  and 
tail  off  in  both  directions  from  that  mode,  Professor  Pareto  was  led  to 
think  of  the  possibilities  of  the  simplest  of  all  frequency  curves,  the  normal 
curve  of  error.  However,  after  examination  and  consideration,  he  felt 
strongly  that  the  normal  curve  of  error  could  not  possibly  be  used.  He 
became  convinced  that  the  normal  curve  was  not  the  law  of  the  data  for 
the  good  and  sufficient  reason  that  the  part  of  the  data  curve  given  by 
income-tax  returns  is  of  a  radically  different  shape  from  any  part  of  a 
normal  curve.1 

Professor  Pareto  finds  a  further  argument  against  using  the  normal  curve 
in  the  irrationality  of  such  a  curve  outside  the  range  of  the  data. 
The  mode  of  the  complete  frequency  curve  for  income  distribution  is  at 
least  as  low  as  the  minimum  taxable  income.  Income-tax  data  prove  this. 
However,  a  normal  curve  is  symmetrical.  Hence,  if  a  normal  curve  could 
describe  the  upper  ranges  of  the  income  curve  as  given  by  income-tax  data 
then  in  the  lower  ranges  it  would  cut  the  y  axis  and  pass  into  the  second 
quadrant,  in  other  words  show  a  large  number  of  negative  incomes. 

Now,  aside  from  the  fact  that  this  whole  argument  is  unnecessary  if 
the  data  themselves  cannot  be  described  even  approximately  by  a  normal 
curve,  Professor  Pareto's  discussion  reveals  a  curious  change  in  his  middle 
term.  If  he  had  said  that  a  symmetrical  curve  on  a  natural  scale  with  a 
mode  at  least  as  low  as  the  income-tax  minimum  would  show  unbelievably 
large  negative  incomes  we  could  follow  him  but  when  he  states  that  not 
only  can  there  be  no  zero  incomes  but  that  there  can  be  no  incomes  below 
"the  minimum  of  existence"  we  realize  that  he  has  unconsciously  changed 
the  meaning  of  his  middle  term.  Having  examined  a  mass  of  income-tax 
data,  all  of  which  were  concerned  with  net  money  income  and  from  these 
data  having  formulated  a  law,  he  now  apparently  without  realizing  it, 
changes  the  meaning  of  the  word  income  from  net  money  income  to  money 
value  of  commodities  consumed,  and  assumes  that  those  who  receive  a  money 
income  less  than  a  certain  minimum  must  inevitably  die  of  starvation. 

1  Though  Pareto  seems  to  have  thoroughly  understood  this  fact,  his  discussion  is  not  al- 
together satisfactory.  He  states  that  the  data  for  the  higher  incomes  show  a  larger  number 
of  such  incomes  than  the  normal  curve  would  indicate.  This  is  hardly  adequate.  To  have 
stated  that  the  upper  and  lower  ranges  showed  too  many  incomes  as  compared  with  the  middle 
range  would  have  been  better.  An  easy  way  to  realize  clearly  the  impossibility  of  describing 
income-tax  data  by  a  normal  curve  is  to  plot  a  portion  of  the  non-cumulative  data  on  a  natural 
x  log  y  basis.  When  so  charted  the  data  present  a  concave  shaped  curve.  However,  if  the 
data  were  describable  by  any  part  of  a  normal  curve  of  error,  they  would  show  a  convex  ap- 
pearance, or  in  the  limiting  case  a  straight  line,  as  the  equation  of  the  normal  curve  of  error 
—  x1 

\Vx  =  Voe    a  )  becomes,  on  a  natural  x  log  y  scale,  log^/i  =  logey0  —  9— ,  or  a  second  degree 

parabola  whose  axis  is  perpendicular  to  the  x  axis  of  coordinates. 

The  reader  must  note  that  the  limiting  straight  line  case  mentioned  above  is  on  a  natural 
x  log  y  scale  and  not  (as  the  Pareto  straight  line)  on  a  log  x  log  y  scale.  (Note  concluded 
page  347.) 
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Children  receive  in  general  negligible  money  incomes.  Many  other  persons 
in  the  community  are  in  the  same  position.  A  business  man  may  "lose 
money"  in  a  given  year,  in  other  words  he  may  have  a  negative  money 
income.  There  seems  no  essential  absurdity  in  assuming  that  a  large 
number  of  persons  receive  money  incomes  much  less  than  necessary  to 

(Note  1  page  346  concluded.) 

Chart  28A  showing  curves  fitted  to  observations  on  the  heights  of  men  illustrates  the  ap- 
pearance of  the  normal  curve  on  a  natural  scale  and  on  a  natural  x  log  y  scale.  That  chart 
also  illustrates  another  fact  of  importance  in  this  discussion,  namely,  that  fitting  to  a  different 
function  of  the  variable  gives  a  different  fit. 
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1-Normal  Curve  Fitted  to  Natural  Scale 
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Loss  of  Same. 

2-5econd  Degree  Parabola  Fitted  to 
Natural  x  Log  Y  Data  by  Method   of 
Least  Squares,  also  Antiloss  of  Same. 
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support  existence.  When  in  1915  Australia  took  a  census  of  the  incomes 
of  all  persons  "possessed  of  property,  or  in  receipt  of  income,"  over  14 
per  cent  of  the  returns  showed  incomes  "deficit  and  nil."  1 

Professor  Pareto's  realization  of  the  impossibility  of  describing  income 
distributions  by  means  of  normal  curves  led  him  to  the  curious  conclusion 
that  such  distributions  were  somehow  unique  and  could  not  be  explained 
upon  any  "chance"  hypothesis.  "The  shape  of  the  curve  which  is  fur- 
nished us  by  statistics,  does  not  correspond  at  all  to  the  curve  of  errors, 
that  is  to  say  2  to  the  form  which  the  curve  would  have  if  the  acquisition 
and  conservation  of  wealth  depended  only  on  chance."  3  Moreover,  while 
Professor  Pareto's  further  suggestion  of  possible  heterogeneity  in  the  data 
corresponds  we  believe  to  the  facts,  his  reason  for  making  such  a  sug- 
gestion, namely  that  the  data  cannot  be  adequately  described  by  a 
normal  curve,  is  irrelevant.4  "Chance"  data  distributions  are  no  longer 
thought  of  as  necessarily  in  any  way  similar  to  the  normal  curve.  Even 
error  distributions  commonly  depart  widely  from  the  normal  curve. 
The  best  known  system  of  mathematical  frequency  curves,  that  of 
Karl  Pearson,  is  intended  to  describe  homogeneous  material  and  is 
based  upon  a  probability  foundation,  yet  the  normal  curve  is  only 
one    of    the    many    and    diverse    forms    yielded    by    his    fundamental 

d  log  y  x  +  a         , 

equation   —  =  ■ -.5 

dx  b0-{-  bix  +  b2x2 

While  Pareto's  Law  in  its  straight  line  form  was  at  least  an  interesting 
suggestion,  his  efforts  to  amend  the  law  have  not  been  fruitful.  His  at- 
tempts to  substitute  \ogeN  =  loggA  —  a  loge(x  +  a)  or  even  loggiV  = 
logeA  —  a  log^x  +  a)  —  (3x  for  the  simpler  log  N  =  log  A  —  a  log  £ 
have  not  materially  advanced  the  subject.6  The  more  complicated  curves 
have  the  same  fundamental  drawbacks  as  the  simpler  one.  Among  other 
peculiarities  they  involve  the  same  absurdity  of  an  infinite  number  of 
persons  in  the  modal  interval  and  none  below  the  mode.  Along  with  the 
doubling  of  the  number  of  constants,  there  comes  of  course  the  possibility 
of  improving  the  fit  within  the  range  of  the  data.  Such  improvement  is, 
however,  purely  artificial  and  empirical  and  without  special  significance, 
as  can  be  easily  appreciated  by  noticing  the  mathematical  characteristics 
of  the  equation. 

A  number  of  other  statisticians  have  at  various  times  fitted  different 
types  of  frequency  curves  to  distributions  of  income,  wages,  rents,  wealth, 

*  Compare  Table  29A. 

*  My  italics. 

» Manuel,  p.  385.    See  also  Cours,  pp.  416  and  417. 

*  Vid.  Cours,  pp.  416  and  417. 

*  Professor  A.  W.  Flux  in  a  review  of  Pareto's  Cours  d'Economie  Politique  (Economic  Journal, 
March,  1897)  drew  attention  to  the  inadequacy  of  Pareto's  conception  of  what  were  and  what 
were  not  "chance"  data. 

*  Cf.  Cours,  vol.  II,  p.  305,  note. 
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or  allied  data.1  However,  no  one  has  advanced  such  claims  for  a  "law" 
of  income  2  distribution  as  were  at  one  time  made  by  Professor  Pareto. 
When  considering  the  possibility  of  helpfully  describing  the  distribution 
of  income  by  any  simple  mathematical  expression,  one  inevitably  begins 
by  examining  "Pareto's  Law."  It  is  so  outstanding.  Let  us  therefore 
examine  Pareto's  Law. 

1.  Do  income  distributions,  when  plotted  on  a  double  log  scale, 
approximate  straight  lines  closely  enough  to  give  such  approxi- 
mation much  significance? 

Before  attempting  to  answer  this  question  it  is  of  course  necessary  to 
decide  how  we  shall  obtain  the  straight  line  with  which  comparisons  are 
to  be  made. 

Professor  Pareto  fitted  straight  lines  directly  by  the  method  of  least 
squares  to  the  cumulative  distribution  plotted  on  a  double  log  scale.  The 
disadvantage  of  this  procedure  is  that,  though  one  may  obtain  the  straight 
line  which  best  fits  the  cumulative  distribution,  such  a  straight  line  may  be 
anything  but  an  admirable  fit  to  the  non-cumulative  figures.  For  example, 
if  a  straight  line  be  fitted  by  the  method  of  least  squares  to  Prussian  re- 
turns for  1886  (as  given  by  Professor  Pareto)  the  total  number  of  income 
recipients  within  the  range  of  the  data  is,  according  to  the  fitted  straight 
line,  only  5,399,000  while  the  actual  number  of  returns  was  5,557,000, 
notwithstanding  the  fact  that  Prussia,  1886,  is  a  sample  which  runs  much 
more  nearly  straight  than  is  usual.  How  bad  the  discrepancy  may  be 
where  the  data  do  not  even  approximate  a  straight  line  is  seen  in  Professor 
Pareto's  Oldenburg  material.  There  the  least-squares  straight  line  fitted 
to  the  cumulative  distribution  on  a  double  log  scale  gives  91,222  persons 
having  incomes  over  300  marks  per  annum  while  the  data  give  only  54,309. 

1  Among  others,  Karl  Pearson,  F.  Y.  Edgeworth,  Henry  L.  Moore,  A.  L.  Bowley,  Lucien 
March,  J.  C.  Kapteyn,  C.  Bresciani,  C.  Gini,  F.  Savorgnan. 

2  Professor  H.  L.  Moore,  in  his  Laws  of  Wages,  is  concerned  primarily  with  wages  not 
income. 

Professor  J.  C.  Kapteyn  has  presented  a  pretty  but  somewhat  hypothetical  argument  sug- 
gesting that  the  skewness  in  the  income  frequency  curve  should  be  such  that  plotting  on  a 
log  x  basis  would  eliminate  it. 

"In  several  cases  we  feel  at  once  that  the  effect  of  the  causes  of  deviation  cannot  be  inde- 
pendent of  the  dimension  of  the  quantities  observed.  In  such  cases  we  may  conclude  at  once 
that  the  frequency  curve  will  be  a  skew  one.    To  take  a  single  example: 

"Suppose  1000  men  to  begin  trading,  each  with  the  same  capital;  in  order  to  see  how  their 
wealth  will  be  distributed  after  the  lapse  of  10  years,  consider  first  what  will  be  their  condition 
at  some  earlier  epoch,  say  at  the  end  of  the  fifth  year. 

"We  may  admit  that  a  certain  trader  A  will  then  only  possess  a  capital  of  £100,  while 
another  may  possess  £100,000. 

"Now  if  a  certain  cause  of  gain  or  loss  comes  to  operate,  what  will  happen? 

"For  instance:  Let  the  price  of  an  article  in  which  both  A  and  B  have  invested  their  capital, 
rise  or  fall.  Then  it  will  be  evident  that  if  the  gain  or  loss  of  A  be  £10,  that  of  B  will  not  be 
£10,  but  £10,000;  that  is  to  say,  the  effect  of  this  cause  will  not  be  independent  of  the  capital, 
but  proportional  to  it." 

J.  C.  Kapteyn,  Skew  Frequency  Curves  in  Biology  and  Statistics,  p.  13. 
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The  reason  for  this  peculiarity  of  the  fit  to  the  cumulative  distribution 
becomes  clear  when  we  remember  that  the  least-squares  straight  line  may 
easily  deviate  widely  from  the  first  datum  point  while  a  straight  line  giving 
the  same  number  of  income  recipients  as  the  data  must  necessarily  pass 
through  the  first  datum  point.1 

A  straight  line  fitted  in  such  a  manner  that  the  total  number  of  per- 
sons and  total  amount  of  income  correspond  to  the  data  for  these  items 
gives  what  seems  a  much  more  intelligible  fit.  Charts  28B  to  28G  show 
cumulative  United  States  frequency  distributions  from  the  income-tax 
returns  for  the  years  1914  to  1919  on  a  double  log  scale  (Professor  Pareto's 
suggestion).  Two  straight  lines  are  fitted  to  each  distribution — one  a 
solid  least-squares  line  fitted  to  the  cumulative  data  points  and  the 
other  a  dotted  line  so  fitted  that  the  total  number  of  persons  and  total 
amount  of  income  correspond  to  the  data  figures.  While  the  least-squares 
line  may  appear  much  the  better  fit  to  these  cumulative  data,  a  mere 
glance  at  Tables  28B  to  28G  will  reveal  the  fact  that  such  a  line  is,  to 
say  the  least,  a  less  interpretable  fit  to  the  non-cumulative  distribution.2 
It  is,  of  course,  evident  that  neither  line  is  in  any  year  a  sufficiently  good 
fit  to  the  actual  non-cumulative  distribution  to  have  much  significance. 
No  mathematics  is  necessary  to  demonstrate  this.3 

1  e.  g.  in  the  case  of  Prussia,  1886,  the  first  datum  point  is  x  =  "over  300M"  and  y  =  54,309 
persons. 

2  Professor  Warren  M.  Persons  discussed  the  fit  of  the  least-squares  straight  line  to  Professor 
Pareto's  Prussian  data  for  1892  and  1902  in  the  Quarterly  Journal  of  Economics,  May,  1909, 
and  demonstrated  the  badness  of  fit  of  that  line  to  those  data. 

5  The  income  returned  for  the  years  1914  and  1915  was  estimated  from  the  number  of  re- 
turns.   Income  is  not  given  in  the  reports  for  those  years. 

In  fitting  straight  lines  to  the  data  of  Tables  28B  to  28G  the  lowest  income  interval  (in 
which  married  persons  making  a  joint  return  are  exempt)  has  always  been  omitted.  To  have 
included  in  our  calculations  these  lowest  intervals  would  have  increased  still  further  the  bad- 
ness of  the  fit  in  the  other  intervals. 
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CHART  28  B 
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CHART  28C 
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CHART  28  D 
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CHART  28  F 
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CHART  28  G 
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TABLE  28B 


UNITED  STATES  INCOME-TAX  RETURNS,  1914 


A 

B 

C 

Straight  line 

Per 

Per 

Income  class 

U.  S.  in- 
come-tax 
returns 

Least-squares 
straight  line 

giving 
correct  total 
returns  and 

income 

cent 
A  is 
of  B 

cent 
A  is 
of  C 

$     3,000-$     4,000 

(82,754) 

4,000-       5,000 

66,525 

101,241 

84,683 

65.7 

78.6 

5,000-      10,000 

127,448 

160,545 

115,347 

79.4 

110.5 

10,000-      15,000 

34,141 

38,630 

32,716 

88.4 

104.4 

15,000-     20,000 

15,790 

15,853 

14,102 

99.6 

112.0 

20,000-      25,000 

8,672 

8,230 

7,589 

105.4 

114.3 

25,000-      30,000 

5,483 

4,879 

4,631 

112.4 

118.4 

30,000-      40,000 

6,008 

5,380 

5,267 

111.7 

114.1 

40,000-      50,000 

3,185 

2,793 

2,835 

114.0 

112.3 

50,000-    100,000 

5,161 

4,430 

4,756 

116.5 

108.5 

100,000-    150,000 

1,189 

1,065.5 

1,241 

111.6 

95.8 

150,000-    200,000 

406 

437.3 

535 

92.8 

75.9 

200,000-    250,000 

233 

227.1 

288.1 

102.6 

80.9 

250,000-    300,000 

130 

134.6 

175.5 

96.6 

74.1 

300,000-    400,000 

147 

148.46 

199.9 

99.0 

73.5 

400,000-    500,000 

69 

77.06 

107.6 

89.5 

64.1 

500,000-1,000,000 

114 

122.20 

180.4 

93.3 

63.2 

1,000,000  and  over 

60 

62.78 

107.5 

95.6 

55.8 

Total  (over  $4,000) 

274,761 

344,256.00 

274,761.0 
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TABLE  28C 


UNITED  STATES  INCOME-TAX  RETURNS,  1915 


A 

B 

C 

Income  class 

U.  S.  in- 
come-tax 
returns 

Least- 
squares 
straight  line 

Straight  line 

giving 

correct  total 

returns  and 

income 

Per 

cent 
"A  is 
of  B 

Per 

cent 
A  is 
of  C 

$        3,000-$      4,000 

4,000-        5,000 

5,000-      10,000 

10,000-      15,000 

15,000-      20,000 

20.000-      25,000 

25,000-      30,000 

30,000-      40,000 

40,000-      50,000 

50,000-    100,000 

100,000-    150,000 

150,000-    200,000 

200,000-    250,000 

250,000-    300,000 

300,000-    400,000 

400,000-    500,000 

500,000-1,000,000 

1,000,000  and  over 

(69,045) 

58,949 

120,402 

34,102 

16,475 

9,707 

6,196 

7,005 

4,100 

6,847 

1,793 

724 

386 

216 

254 

122 

209 

120 

92,064 
154,507 
40,358 
17,406 
9,372 
5,716 
6,508 
3,503 
5,880 
1,536 
662.5 
356.6 
217.5 
247.7 
133.3 
223.8 
133.6 

68,540 
119,634 
33,013 
14,724 
8,124 
5,050 
5,875 
3,241 
5,653 
1,530 
695.4 
383.8 
238.6 
277.6 
153.2 
267.1 
177.3 

64.0 

77.9 

84.5 

94.7 

103.6 

108.4 

107.6 

117.0 

116.4 

116.7 

103.3 

108.2 

99.3 

102.5 

91.5 

93.4 

89.8 

86.0 

100.6 

103.3 

111.9 

119.5 

122.7 

119.2 

126.5 

121.1 

114.9 

104.1 

100.6 

90.5 

91.5 

79.6 

78.2 

67.7 

Total  (over  $4,000) 

287,607 

338,825.0 

267,607.0 
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TABLE  28D 


UNITED  STATES  INCOME-TAX  RETURNS,  1916 


A 

B 

C 

Income  class 

U.  S.  in- 
come-tax 
returns 

Least-squares 
straight  line 

Straight  line 
giving  correct 
total  returns 

Per 

cent 
Ah 

Per 

cent 
A  is 

and  income 

of  B 

of  C 

$        3,000-$      4,000 

(85,122) 

4,000-        5,000 

72,027 

139,096 

86,588 

51.8 

83.2 

5,000-        6,000 

52,029 

84,759 

54,221 

61.4 

96.0 

6,000-        7,000 

36,470 

56,533 

36,899 

64.5 

98.8 

7,000-        8,000 

26,444 

39,846 

26,516 

66.4 

99.7 

8,000-       9,000 

19,959 

29,292 

19,801 

68.1 

100.8 

9,000-      10,000 

15,651 

22,529 

15,445 

69.5 

101.3 

10,000-      15,000 

45,309 

60,668 

42,879 

74.7 

105.7 

15,000-      20,000 

22,618 

26,120 

19,311 

86.6 

117.1 

20,000-      25,000 

12,953 

14,044 

10,726 

92.2 

120.8 

25,000-      30,000 

8,055 

8,558 

6,705 

94.1 

120.1 

30,000-      40,000 

10,068 

9,731 

7,854 

103.5 

128.2 

40,000-      50,000 

5,611 

5,232 

4,362 

107.2 

128.6 

50,000-      60,000 

3,621 

3,189 

2,730 

113.5 

132.6 

60,000-      70,000 

2,548 

2,126 

1,857 

119.8 

137.2 

70,000-      80,000 

1,787 

1,499 

1,334.8 

119.2 

133.9 

80,090-      90,000 

1,422 

1,102 

996.8 

129.0 

142.7 

90,000-    100,000 

1,074 

847 

777.5 

123.8 

138.1 

100,000-    150,000 

2,900 

2,282.1 

2,158.4 

127.1 

134.4 

150,000-    200,000 

1,284 

982.6 

972.1 

130.7 

132.1 

200,000-    250,000 

726 

528.2 

539.9 

137.4 

134.5 

250,000-    300,000 

427 

321.9 

337.6 

132.6 

126.5 

300,000-    400,000 

469 

366.1 

395.3 

123.1 

118.6 

400,000-    500,000 

245 

198.8 

219.6 

124.5 

111.6 

500,000-1,000,000 

376 

329.6 

387.4 

114.1 

97.1 

1,000,000-1,500,000 

97 

85.83 

108.7 

113.0 

89.2 

1,500,000-2,000,000 

42 

36.96 

48.88 

113.6 

85.9 

2,000,000-3,000,000 

34 

31.98 

44.19 

106.3 

76.9 

3,000,000-4,000,000 

14 

13.77 

19.91 

101.7 

70.3 

4,000,000-5,000,000 

9 

7.40 

11.05 

121.6 

81.4 

5,000,000  and  over 

10 

19.76 

32.87 

50.6 

30.4 

Total  (over  $4,000) 

344,279 

510,374.00 

344,279.00 
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TABLE  28E 


UNITED  STATES  INCOME-TAX  RETURNS,  1917 


A 

B 

C 

U.S. 

income-tax 
returns 

Straight  line 

Per 

Per 

Income  class 

Least-squares 
straight  line 

giving  correct 
total  returns 

cent 
A  is 

cent 
A  is 

and  income 

of  B 

of  C 

$      1,000-$     2,000 

(1,640,758) 

2,000-       2,500 

480,486 

618,069 

517,512 

77.7 

92.8 

2,500-       3,000 

358,221 

367,835 

284,620 

97.4 

125.9 

3,000-       4,000 

374,958 

407,366 

376,117 

92.0 

99.7 

4,000-        5,000 

185,805 

212,569 

184,854 

87.4 

100.5 

5,000-       6,000 

105,988 

126,507 

111,097 

83.8 

95.4 

6,000-       7,000 

64,010 

82,746 

73,355 

77.4 

87.3 

7,000-       8,000 

44,363 

57,357 

51,285 

77.3 

86.5 

8,000-       9,000 

31,769 

41,556 

37,362 

76.4 

85.0 

9,000-      10,000 

24,536 

31,551 

28,551 

77.8 

85.9 

10,000-      11,000 

19,221 

24,097 

21,900 

79.8 

87.8 

11,000-      12,000 

15,035 

19,412 

17,747 

77.5 

84.7 

12,000-      13,000 

12,328 

15,707 

14,440 

78.5 

85.4 

13,000-     14,000 

10,427 

12,751 

11,761 

81.8 

88.7 

14,000-      15,000 

8,789 

10,709 

9,909 

82.1 

88.7 

15,000-      20,000 

29,896 

34,161 

31,891 

87.5 

93.7 

20,000-     25,000 

16,806 

17,825 

16,876 

94.3 

99.6 

25,000-     30,000 

10,571 

10,609 

10,159 

99.6 

104.1 

30,000-     40,000 

12,733 

11,749 

11,385 

108.4 

111.8 

40,000-     50,000 

7,087 

6,130 

6,021 

115.6 

117.7 

50,000-     60,000 

4,541 

3,649 

3,622 

124.4 

125.4 

60,000-     70,000 

2,954 

2,387 

2,391 

123.8 

123.5 

70,000-     80,000 

2,222 

1,653.5 

1,672 

134.4 

132.9 

80,000-     90,000 

1,539 

1,198.5 

1,217.9 

128.4 

126.4 

90,000-    100,000 

1,183 

910.0 

930.8 

130.0 

127.1 

100,000-    150,000 

3,302 

2,384.4 

2,469.5 

138.5 

133.7 

150,000-   200,000 

1,302 

985.2 

1,039.6 

132.2 

125.2 

200,000-    250,000 

703 

514.1 

550.5 

136.7 

127.7 

250,000-   300,000 

342 

305.9 

330.8 

111.8 

103.4 

300,000-    400,000 

380 

338.9 

371.2 

112.1 

102.4 

400,000-   500,000 

179 

176.8 

196.3 

101.2 

91.2 

500,000-   750,000 

225 

199.96 

225.56 

112.5 

99.8 

750,000-1,000,000 

90 

82.61 

94.97 

108.9 

94.8 

1,000,000-1,500,000 

67 

68.77 

80.51 

97.4 

83.2 

1,500,000-2,000,000 

33 

28.42 

'     33.90 

116.1 

97.3 

2,000,000-3,000,000 

24 

23.65 

28.71 

101.5 

83.6 

3,000,000-4,000,000 

5 

9.77 

12.10 

51.2 

41.3 

4,000,000-5,000,000 

8 

5.10 

6.40 

156.9 

125.0 

5,000,000  and  over 

4 

12.42 

16.25 

32.2 

24.6 

Total  (over  $2,000) 

1,832,132 

2,123,640.00 

1,832,132.00 
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TABLE  28F 


UNITED  STATES  INCOME-TAX  RETURNS,  1918 


Income  class 


$      1,000-$     2,000 

2,000-        3,000 

3,000-        4,000 

4,000-        5,000 

5,000-        6,000 

6,000-        7,000 

7,000-        8,000 

8,000-        9,000 

9,000-      10,000 

10,000-      11,000 

11,000-      12,000 

12,000-      13,000 

13,000-      14,000 

14,000-      15,000 

15,000-      20,000 

20,000-      25,000 

25,000-      30,000 

30,000-      40,000 

40,000-      50,000 

50,000-      60,000 

60,000-      70,000 

70,000-      80,000 

80,000-      90,000 

90,000-    100,000 

100,000-    150,000 

150,000-    200,000 

200,000-    250,000 

250,000-    300,000 

300,000-    400,000 

400,000-    500,000 

500,000-    750,000 

750,000-1,000,000 

1,000,000-1,500,000 

1,500,000-2,000,000 

2,000,000-3,000,000 

3,000,000-4,000,000 

4,000,000-5,000,000 

5,000,000  and  over 


U.S. 

income-tax 

returns 


(1,516,938) 

1,496,878 

610,095 

322,241 

126,554 

79,152 

51,381 

35,117 

27,152 

20,414 

16,371 

13,202 

10,882 

9,123 

30,227 

16,350 

10,206 

11,887 

6,449 

3,720 

2,441 

1,691 

1,210 

934 

2,358 

866 

401 

247 

260 

122 

132 

46 

33 

16 

11 

4 

2 

1 


Least-squares 
straight  line 


Straight  line 
giving  correct 
total  returns 

and  income 


Per 

cent 
A  i3 
of  B 


1,375,372 
537,892 
269,674 
155,513 
99,102 
67,184 
47,740 
35,628 
26,793 
21,283 
16,999 
13,638 
11,328 
35,214 
17,654 
10,181 
10,886 
5,458 
3,147 
2,006 
1,359.5 
966.2 
721.0 
1,822.3 
712.7 
357.3 
208.0 
220.3 
110.5 
119.28 
46.66 
36.88 
14.42 
11.40 
4.46 
2.24 
4.86 


1,470,366 
566,044 
280,477 
160,366 
101,389 
68,258 
48,266 
35,795 
26,832 
21,231 
16,873 
13,515 
11,165 
34,486 
17,097 
9,762 
10,336 
5,121 
2,928 
1,852 
1,246 
881.4 
653.7 
1,636.3 
629.8 
312.1 
178.3 
188.7 
93.55 
99.70 
38.36 
29.88 
11.50 
8.96 
3.44 
•  1.71 
3.60 


108.8 

113.4 

119.5 

81.4 

79.9 

76.5 

73.6 

76.2 

76.2 

76.9 

77.7 

79.8 

80.5 

85.8 

92.6 

100.2 

109.2 

118.2 

118.2 

121.7 

124.4 

125.2 

129.5 

129.4 

121.5 

112.2 

119.9 

118.0 

110.4 

110.7 

98.6 

89.5 

111.0 

96.5 

89.7 

89.3 

20.6 


Per 

cent 
A  is 
of  C 


101.8 

107.8 

114.9 

78.9 

78.1 

75.3 

72.8 

75.9 

76.1 

77.1 

78.2 

80.5 

81.7 

87.7 

95.6 

104.5 

115.0 

125.9 

127.0 

131.8 

135.7 

137.3 

142.9 

144.1 

137.5 

128.5 

138.5 

137.8 

130.4 

132.4 

119.9 

110.4 

139.1 

122.8 

116.3 

117.0 

27.8 


Total  (over  $2,000) 


2,908,176 


2,769,408.00 


2,908,176.00 
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TABLE  28G 


UNITED  STATES  INCOME-TAX  RETURNS,  1919 


A 

B 

C 

U.  S. 

Straight  line 

Per 

Per 

Income  class 

income-tax 
returns 

Least-squares 
straight  line 

giving  correct 
total  returns 

cent 
A  is 

cent 
A  is 

and  income 

of  B 

of  C 

$      1,000-$      2,000 

(1,924,872) 

2,000-        3,000 

1,569,741 

1,984,285 

1,673,688 

79.1 

93.8 

3,000-       4,000 

742,334 

764,739 

660,950 

97.1 

112.3 

4,000-       5,000 

438,154 

379,330 

333,645 

115.5 

131.3 

5,000-       6,000 

167,005 

216,921 

193,470 

77.0 

86.3 

6,000-       7,000 

109,674 

137,278 

123,953 

79.9 

88.5 

7,000-       8,000 

73,719 

92,511 

84,273 

79.7 

87.5 

8,000-       9,000 

50,486 

65,403 

60,066 

77.2 

84.1 

9,000-      10,000 

37,967 

48,583 

44,980 

78  1 

84.4 

10,000-      11,000 

28,499 

36,386 

33,887 

78.3 

84.1 

11,000-      12,000 

22,841 

28,796 

27,027 

79.3 

84.5 

12,000-      13,000 

18,423 

22,921 

21,600 

80.4 

85.3 

13,000-      14,000 

15,248 

18,329 

17,395 

83.2 

87.7 

14,000-      15,000 

12,841 

15,181 

14,459 

84.6 

88.8 

15,000-     20,000 

42,028 

46,868 

45,162 

89.7 

93.1 

20,000-     25,000 

22,605 

23,249 

22,797 

97.2 

99.2 

25,000-      30,000 

13,769 

13,294 

13,228 

103.6 

104.1 

30,000-     40,000 

15,410 

14,084 

14,219 

109.4 

108.4 

40,000-     50,000 

8,298 

6,986 

7,178 

118.8 

115.6 

50,000-     60,000 

5,213 

3,994 

4,162 

130.5 

125.3 

60,000-     70,000 

3,196 

2,528 

2,665 

126.4 

119.9 

70,000-     80,000 

2,237 

1,704 

1,813 

131.3 

123.4 

80,000-     90,000 

1,561 

1,205 

1,292 

129.5 

120.8 

90,000-    100,000 

1,113 

894 

968.3 

124.5 

114.9 

100,000-    150,000 

2,983 

2,240 

2,461.5 

133.2 

121.2 

150,000-    200,000 

1,092 

863.2 

971.6 

126.5 

112.4 

200,000-    250,000 

522 

428.1 

490.4 

121.9 

106.4 

250,000-    300,000 

250 

245.0 

284.4 

102.0 

87.9 

300,000-   400,000 

285 

259.2 

306.0 

110.0 

93.1 

400,000-    500,000 

140 

128.6 

154.4 

108.9 

90.7 

500,000-   750,000 

129 

137.32 

168.2 

93.9 

76.7 

750,000-1,000,000 

60 

52.89 

66.4 

113.4 

90.4 

1,000,000-1,500,000 

34 

41.25 

52.95 

82.4 

64.2 

1,500,000-2,000,000 

13 

15.89 

20.90 

81.8 

62.2 

2,000,000-3,000,000 

7 

12.40 

16.68 

56.5 

42.0 

3,000,000  and  over 

11 

12.15 

17.27 

90.5 

63.7 

Total  (over  $2,000) 

3,407,888 

3,929,905.00 

3,407,888.00 
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Why  do  the  least-squares  straight  lines  appear  graphically  such  good 
fits  to  the  cumulative  distributions  (for  at  least  the  later  years)  when  a 
merely  arithmetic  analysis  shows  even  this  fit  to  the  cumulative  data  to 
be  so  illusory?  Because  the  percentage  range  in  the  number  of  persons  is  so 
extremely  wide.  The  deviations  of  the  cumulative  data  on  a  double  log 
scale  from  the  least-squares  straight  line  are  minute  when  compared  with 
the  percentage  changes  in  the  data  from  the  smallest  to  the  largest  incomes. 
But  this  is  not  helpful.  The  fact  that  there  are  100,000  times  as  many 
persons  having  incomes  over  $2,000  per  annum  as  there  are  persons 
having  incomes  over  $5,000,000  per  annum,  does  not  make  a  theoretical 
reading  for  a  particular  income  interval  of  twenty  or  thirty  per  cent  over 
or  under  the  data  reading  an  unimportant  deviation.  Charting  data  on 
a  double  log  scale  may  thus  become  a  fertile  source  of  error  unless  ac- 
companied by  careful  interpretation.1  This  fact  has  long  been  recognized 
by  engineers  and  others  who  have  had  much  experience  with  similar  prob- 
lems in  curve  fitting. 

Another  matter  of  some  importance  must  be  noted  here.  The  devia- 
tions of  the  data  from  the  straight  lines  might  be  much  less  than  they  are 
and  yet  constitute  extremely  bad  fits.  The  data  points  (even  on  a  non- 
cumulative  basis)  do  not  flutter  erratically  from  side  to  side  of  the  fitted  lines; 
they  run  smoothly,  passing  through  the  fitted  line  at  small  angles  in  the  way 
that  one  curve  cuts  another.  Now,  in  curve  fitting,  such  a  condition  always 
strongly  suggests  that  the  particular  mathematical  curve  used  is  not  in 
any  sense  the  "law"  of  the  data. 

2.  Are  the  slopes  of  the  straight  lines  fitted  to  income  data 
from  different  times  and  places  similar  in  any  significant  degree? 

1  The  dangers  of  fitting  curves  with  such  a  combination  as  a  cumulative  distribution  and 
a  double  log  scale,  without  further  analysis,  is  well  illustrated  by  the  results  Professor  Pareto 
obtained  for  Oldenburg.  To  the  Oldenburg  data  he  fitted  the  rather  complicated  equation 
log  N  =  log  A  —  a  log  (x  +  a)  —  Bx  and  obtained  the  following  results.  (The  value  Pareto 
gives  for  8,  namely  .0000631,  does  not  check  with  his  calculated  figures  given  below.  B  = 
.0000274  is  evidently  what  he  intended.) 


Income  in 
marks  (over) 

N 

Logarithms  of  N 

Observed 

Calculated 

A 

300 

600 

900 

1,500 

3,000 

6,000 

9,000 

15,300 

30,000 

54,309 

24,043 

16,660 

9,631 

3,502 

994 

445 

140 

25 

4.7349 
4.3810 
4.2217 
3.9837 
3.5443 
2.9974 
2.6484 
2.1461 
1.3979 

4.7349 
4.4368 
4.2304 
3.9409 
3.5008 
2.9997 
2.6671 
2.1838 
1 . 3364 

—  .0558 

—  .0086 
+ . 0428 
+  .0435 

—  .0023 

—  .0187 

—  .0377 
+  .0615 

(From  Cours  d' Economie  Politique,  vol.  II,  p.  307.) 

The  above  table  may  give  the  reader  a  vague  idea  that  the  fit  is  rather  good, 
from  the  above  table  the  following  table  may  be  directly  derived : 
(Note  concluded  page  364.) 


However, 
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If  income  distributions  charted  on  a  double  log  scale  not  only  cannot 
be  approximately  represented  by  straight  lines,  but  also  differ  radically 

(Note  1  page  363  concluded.) 


Number  of  persons 

Income  in  marks 

Actual 

Computed 

of  computed 

300-      600 

600-      900 

900-  1,500 

1,500-  3,000 

3,000-  6,000 

6,000-  9,000 

9,000-15,300 

15,300-30,000 

Over  30,000 

30,266 

7,383 

7,029 

6,129 

2,508 

549 

305 

115 

25 

26,969 

10,342 

8,270 

5,560 

2,169 

534 

312 

131 

22 

112.2 

71.4 

85.0 

110.2 

115.6 

102.8 

97.8 

87.8 

113.6 

Total 

54,309 

54,309 

100.0 

The  fit  no  longer  impresses  one  as  quite  so  good.     See  Chart  28H  below. 
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CHART  28  H 

OLDENBURG   INCOME  TAX  RETURNS 
1890 
CUMULATIVE  FREQUENCY  DISTRIBUTION 
WITH  TWO  riTTED  CURVES 

(f)  log  y*9.0077-/.632l/ogz  (least smarts straigM line) 
(Z)loj y  °  d.72M4 -1.465  /og  (x+2Z0)  -  .0000274  x 

(Parefo's  second 'approximation) 
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in  shape,  it  is  of  course  not  of  great  importance  whether  the  straight  lines 
fitted  to  such  data  from  different  times  and  places  have  or  have  not  ap- 
proximately constant  slopes.  For  example,  a  comparison  of  Chart  28C 
showing  the  cumulative  distribution  of  United  States  income-tax  returns 
for  1915  on  a  double  log  scale  and  Chart  28F  showing  similar  data  for 
1918,  makes  it  plain  that,  even  were  the  slopes  of  the  fitted  straight  lines 
for  the  two  years  identical,  the  data  curves  would  still  be  so  different  as 
to  make  the  similarity  of  slope  of  the  fitted  lines  of  almost  no  significance.1 
In  considering  slopes,  let  us  examine  further  both  the  data  and  the 
fitted  lines  for  these  two  years  1915  and  1918.  Tables  281  and  28J  give 
some  numerical  illustrations  of  the  differences  between  the  distributions 
for  the  two  years.  Table  281  gives  the  number  of  returns  in  each  income 
interval  each  year  and  the  percentages  that  the  1918  figures  are  of  the 
1915  figures. 

TABLE  281 


COMPARISON  OF  UNITED  STATES  INCOME-TAX  RETURNS  FOR 

1915  AND  1918 


Income  class 


$  4,000  «»-$.  5,000. 
5,000-      10,000. 

10,000-      15,000 . 

15,000-      20,000. 

20,000-      25,000 . 

25,000-      30,000. 

30,000-     40,000. 

40,000-      50,000. 

50,000-  100,000. 
100,000-  150,000. 
150,000-  200,000. 
200,000-  250,000. 
250,000-  300,000. 
300,000-  400,000. 
400,000-  500,000. 
500,000-1,000,000. 
1,000,000  and  over.  . 


in  umuer  < 

ji  returns 

Ratio  of  1918 

1915 

1918 

to  1915 

58,949 

322,241 

5.4664 

120,402 

319,356 

2.6524 

34,102 

69,992 

2.0524 

16,475 

30,227 

1.8347 

-9,707 

16,350 

1.6844 

•    6,196 

10,206 

1.6472 

7,005 

11,887 

1.6969 

,-4,100 

6,449 

1.5729 

6,847 

9,996 

1.4599 

1,793 

2,358 

1.3151 

724 

866 

1.1961 

386 

401 

1.0389 

216 

247 

1 . 1435 

254 

260 

1.0236 

122 

122 

1.0000 

209 

178 

.8517 

120 

67 

.5583 

a  The  $3,000-$4,000  class  is  not  included,  as  in  1915  married  persons  in  that  class  were 
exempted  while  in  1918  they  were  not. 


The  change  as  we  pass  from  the  $4,000-$5,000  interval,  where  the  1918 
figures  are  nearly  five-and-a-half  times  the  1915  figures,  to  the  intervals 
above  $500,000,  where  the  1918  figures  are  actually  less  than  the  1915 
figures,  illustrates  the  great  and  fundamental  difference  between  the  slopes 
of  the  two  distributions.     However,  such  a  comparison  of  unadjusted 

1  Compare  also  the  deviations  from  the  fitted  lines  as  given  in  Tables  28C  and  28F. 
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money  intervals,  while  it  throws  into  relief  the  differences  in  slope  of  the 
two  distributions,  is  by  no  means  as  enlightening  for  purposes  of  exhibiting 
their  other  essential  dissimilarities  as  a  comparison  of  the  two  sets  of  data 
after  they  have  been  adjusted  for  changes  in  average  (per  capita)  income 
and  changes  in  population.  Table  28 J  gives  some  comparisons  between  the 
data  for  the  two  years  and  between  the  fitted  lines  for  the  two  years  on 
such  an  adjusted  basis.  Two  intervals,  one  in  the  relatively  low  income 
range  and  the  other  in  the  high  income  range,  are  used  to  illustrate  the 
essentially  different  character  of  the  distributions  for  the  two  years. 

TABLE  28J 


COMPARISONS  OF  UNITED  STATES  INCOME-TAX  RETURNS  FOR  THE  YEARS  1915  AND 
1918  ADJUSTED  FOR  CHANGES  IN  AVERAGE  (PER  CAPITA)  INCOME  AND  CHANGES 
IN  POPULATION 


ACTUAL  INCOME-TAX  DATA 


Income  intervals 

Number  of  returns 
(1)                           (2) 

Fraction  of  population 
(3)                        (4) 

Ratio  of 

Column  (4) 

to  Column  (3) 

1915 

1918 

1915 

1918 

Between  12  and  13 
times  average  income 

21,190 

31,197 

.00021099 

. 00029945 

1.4193 

Between  1,200  and  1,300 
times  average  income 

43.85 

20.37 

.0000004366 

.0000001955 

.4478 

Over  12  times  average 
income 

248,600 

271,452 

.00247536 

.00260561 

1.0526 

Amount  in  dollars 

Per  cent  of  total  income 

Over  12  times  average 
income 

1915 
$4,283,010,735 

1918 
$5,312,832,516 

1915 
11.9% 

1918 

8.7% 

.  7311 

LEAST-SQUARES  STRAIGHT  LINES 


Income  intervals 

Number  of  returns 
(1)                         (2) 

Fraction  of  population 
(3)                       (4) 

Ratio  of 
Column  (4) 
to  Column  (3) 

1915 

1918 

1915 

1918 

Between  12  and  13 
times  average  income 

32,886 

41,730 

.00032745 

.00040056 

1.2233 

Between  1,200  and  1,300 
times  average  income 

47.63 

17.10 

.0000004743 

.0000001641 

.3460 

STRAIGHT  LINES  FITTED  TO  GIVE  THE  SAME  TOTAL  NUMBER  OF  RETURNS  AND  THE 
SAME  TOTAL  INCOME  AS  THE  INCOME-TAX  DATA 


Income  intervals 

Number  of  returns 
(1)                         (2) 

Fraction  of  population 
(3)                      (4) 

Ratio  of 

Column  (4) 

to  Column  (3) 

1915 

1918 

1915 

1918 

Between  12  and  13 
times  average  income 

24,510 

42,460 

.00024405 

.00040756 

1.6700 

Between  1,200  and  1  300 
times  average  income 

54.73 

14.15 

.0000005450 

.0000001358 

.2492 
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NOTES  TO  TABLE  28J 
"Average  Income"  Intervals 

1915 

1918 

$        358 

4,296 

4,654 

429,600 

465,400 

$        586 

7,032 

7,618 

703,200 

761,800 

13     "             "           "        

1,200     "             "           "        

1,300     "             "           "        

Equations  of  Fitted  Straight  Lines  on  a  Cumulative  Double  Log  Basis 


Least- squares  lines 

Lines  giving  correct  total 

number  of  returns  and 

total  income 

1914 

y  =  11 .  153322  —  1 .  559256  x 
y  =  10.643299  — 1.419579  x 
y  =  10.839435—1.424638  x 
y  =  11.410606— 1.539996  x 
y  =  12 .  033697  —  1 .  693823  x 
y  =  12 . 320963  —  1 . 734802  x 

y  =  10 . 557242  —  1 . 420936  x 

1915 

y  =  10.202382  —  1 .325598  x 

1916 

y  =  10.212702  — 1.298088  x 

1917 

y  =  11 . 170980  —  1 .486817  x 

1918 

y  =  12 . 202452  —  1 . 738497  x 

1919 

y  =  12.036155— 1.667258  x 

Table  28J  needs  little  discussion.  In  the  section  treating  actual  income- 
tax  data  we  notice  that  while  the  adjusted  number  of  returns  in  the  lower 
income  interval 1  increased  41.93  per  cent  from  1915  to  1918,  the  adjusted 
number  of  returns  in  the  upper  income  interval 2  decreased  55.22  per  cent. 
Moreover,  while  the  adjusted  total  number  of  returns  above  the  "  12-times- 
average-income"  point  increased  5.26  per  cent,  the  adjusted  amount  of 
income  reported  in  these  returns  decreased  26.89  per  cent. 

Such  figures  suggest  a  rather  radical  change  in  the  distribution  of  in- 
come during  this  short  three-year  period.  Similar  conclusions  may  be 
drawn  from  the  figures  for  the  two  pairs  of  fitted  lines,  though  we  must 
of  course  remember  that  these  lines  describe  only  very  inadequately  the 
actual  data.  The  lines  so  fitted  as  to  give  each  year  the  same  total  number 
of  returns  and  total  amount  of  income  as  the  data  for  that  year  yield 
sensational  results.  While  the  adjusted  number  of  returns  in  the  lower 
income-interval  increased  67  per  cent,  the  adjusted  number  of  returns 
in  the  upper  income-interval  decreased  75.08  per  cent. 

Finally,  it  has  been  suggested  that  changes  in  the  characteristics  of  the 
tax-income-distribution  in  the  United  States  from  1915  to  1918  may  be 
accounted  for  as  the  results  of  the  increase  in  the  surtax  rates  with  1917. 
We  do  not  believe  any  large  part  of  these  changes  can  be  so  accounted 
for.  Notwithstanding  the  fact  that  the  country  entered  the  European 
war  during  the  interval,  the  difference  between  the  1915  distribution  and 
the  1918  distribution  in  the  United  States,  extreme  as  it  is,  cannot  be  said 
to  be  unreasonably  or  unbelievably  great.  Even  the  changes  in  the  slope 
of  the  least-squares  line  are  not  phenomenal.  Pareto's  Prussian  figures 
contain  fluctuations  in  slope  from  — 1.60  to  — 1.89  while  the  slope  of  the 
least-squares  straight  line  fitted  to  his  Basle  data  is  only  — 1.25.     The 

1  Between  12  and  13  times  the  average  income  (per  capita)  each  year. 

1  Between  1,200  and  1,300  times  the  average  income  (per  capita)  each  year. 
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slopes  of  the  least-squares  straight  lines  fitted  to  the  American  data  are 
—1.42  for  1915  and  —1.69  for  1918. 

3.  If  the  upper  income  ranges  (or  "tails")  of  income  distributions 
were,  when  charted  on  a  double  log  scale,  closely  similar  in  shape, 
would  that  fact  justify  the  assumption  that  the  lower  income  ranges 
were  likewise  closely  similar? 

Before  attempting  to  answer  the  above  question,  let  us  summarize  the 
case  we  have  just  made  against  believing  the  "tails"  significantly  similar. 
We  can  then  discuss  how  much  importance  such  similarity  would  have 
did  it  exist. 

We  have  found  upon  examination  that  the  approximation  to  straight 
lines  of  the  tails  of  income  distributions  plotted  on  double  log  scales  is 
specious;  that  the  slopes  of  the  fitted  straight  lines  differ  sufficiently  to 
produce  extreme  variations  in  the  relative  number  of  income  recipients 
in  the  upper  as  compared  with  the  lower  income  ranges  of  the  tails; 
that  the  upper  and  lower  income  ranges  of  the  actual  data  for  different 
times  or  places  tell  a  similar  story  of  extreme  variation;  and  that  the 
irregularities  in  shape  of  the  tails  of  the  actual  data,  entirely  aside 
from  any  question  of  approximating  or  not  approximating  straight  lines 
of  constant  slope,  vary  greatly  from  year  to  year  and  from  country  to 
country,  ranging  all  the  way  from  the  irregularities  of  such  distributions 
as  the  Oldenburg  data,  through  the  American  data  for  1914,  1915  and  1916 
to  such  an  entirely  different  set  of  irregularities  as  those  seen  in  the  Amer- 
ican data  for  19181. 

At  this  stage  of  the  discussion  the  reader  may  ask  whether  a  general 
appearance  of  approximating  straight  lines  on  a  double  log  scale,  poor  as  the 
actual  fit  may  be  found  to  be  under  analysis,  has  not  some  meaning,  some 
significance.  The  answer  to  this  question  must  be  that,  if  we  were  not  deal- 
ing with  a  frequency  distribution  but  with  a  correlation  table  showing  a 
relationship  between  two  variables,  an  approximation  of  the  regression  lines 
to  linearity  when  charted  on  a  double  log  scale  might  easily  be  the  clue 
to  a  first  approximation  to  a  rational  law;  but  that,  on  the  other  hand,  ap- 
proximate linearity  in  the  tail  of  a  frequency  distribution  charted  on  a  double 
log  scale  signifies  relatively  little  because  it  is  such  a  common  charac- 
teristic of  frequency  distributions  of  many  and  varied  types. 

The  straight  line  on  a  double  log  scale  or,  in  other  words,  the  equation 
y  =  bxm,  when  used  to  express  a  relationship  between  two  variables,  is,  to 
quote  a  well-known  text  on  engineering  mathematics,  "one  of  the  most 
useful  classes  of  curves  in  engineering."  2  In  deciding  what  type  of  equa- 
tion to  use  in  fitting  curves  by  the  method  of  least  squares  to  data  con- 

1  Compare  Charts  28H,  28B,  28C,  28D  and  28F. 
*  P.  Steinmetz,  Engineering  Mathematics,  p.  216. 
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cerning  two  variables  the  texts  usually  mention  y  =  bxm  as  "a  quite  com- 
mon case."  !  A  recent  author  writes,  "simple  curves  which  approximate 
a  large  number  of  empirical  data  are  the  parabolic  and  hyperbolic  curves. 
The  equation  of  such  a  curve  is  y  =  axb  [y  =  bxm],  parabolic  for  6  positive 
and  hyperbolic  for  6  negative."  2  A  widely  used  text  on  elementary 
mathematics  speaks  of  the  equation  y  =  bxm  as  one  of  "the  three  funda- 
mental functions"  in  practical  mathematics.3  The  market  for  "logarith- 
mic paper"  shows  what  a  large  number  of  two-variable  relationships  may 
be  approximated  by  this  equation.  Moreover  this  equation  is  often  a 
close  first  approximation  to  a  rational  law.  Witness  "Boyle's  Law."  In- 
deed, sufficient  use  has  not  been  made  of  this  curve  in  economic  discus- 
sions of  two-variable  problems. 

The  primary  reason  why  approximation  to  linearity  on  a  double  log 
scale  has  no  such  significance  in  the  case  of  the  tail  of  a  frequency  distribu- 
tion as  it  often  has  in  the  case  of  a  two-variable  problem  is  because  of 
the  very  fact  that  we  are  considering  the  tail  of  the  distribution,  in  other 
words,  a  mere  fraction  of  the  data.  While  frequency  distributions  which 
can  be  described  throughout  their  length  by  a  curve  of  the  type  y  =  bxm  are 
extremely  rare,  a  large  percentage  of  all  frequency  distributions  have  tails 
approximating  straight  lines  on  a  double  log  scale.4  It  is  astonishing  how 
many  homogeneous  frequency  distributions  of  all  kinds  may  be  described 
with  a  fair  degree  of  adequacy  by  means  of  hyperbolas  5  fitted  to  the  data 
on  a  double  log  scale.  Along  with  this  characteristic  goes,  of  course,  the 
possibility  of  fitting  to  the  tails  of  such  distributions  straight  lines  approxi- 
mately parallel  to  the  asymptotes  of  the  fitted  hyperbola.  However  we 
have  by  no  means  adequately  described  an  hyperbola  when  we  have 
stated  the  fact  that  one  of  its  asymptotes  is  (of  course)  a  straight  line  and 
that  its  slope  is  such  and  such.  Had  we  even  similar  information  con- 
cerning the  other  asymptote  also,  we  should  know  little  about  the  hyper- 
bola or  the  frequency  distribution  which  it  would  describe  on  a  double 
log  scale.  The  hyperbola  might  coincide  with  its  asymptotes  and  hence 
have  an  angle  at  the  mode  or  it  might  have  a  very  much  rounded  "top." 
Such  a  variation  in  the  shape  of  the  top  of  the  hyperbola  6  would  generally 
correspond  to  a  very  great  variation  in  the  scatter  or  "inequality"  of  the 
distribution  as  well  as  many  other  characteristics. 

1  D.  P.  Bartlett,  Method  of  Least  Squares,  p.  33. 

2  J.  Lipka,  Graphical  and  Mechanical  Computation,  p.  128. 

3  C.  S.  Slichter,  Elementary  Mathematical  Analysis,  preface. 

4  A  very  large  percentage  of  the  remainder  have  tails  approximating  straight  lines  on  a 
natural  x  log  y  basis. 

5  N.  B.  Not  a  straight  line  on  the  double  log  scale,  which  is  a  so-called  hyperbola  on  the 
natural  scale,  but  a  true  conic  section  hyperbola  on  the  double  log  scale. 

Charts  28K  and  28L  (Earnings  per  Hour  of  318,946  Male  Employees  in  1919)  illustrate 
how  excellent  a  fit  may  often  be  obtained  by  means  of  an  hyperbola  even  though  fitted  only 
by  selected  points.  A  comparison  of  the  least-squares  parabola  and  the  selected-points 
hyperbola  on  Chart  28K  illustrates  also  the  straight-tail  effect. 

»  Compare  Karl  Pearson's  concept  of  "kurtosis." 
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CHART  28  K 
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Rough  similarity  in  the  tails  of  two  distributions  on  a  double  log  scale 
by  no  means  proves  even  rough  similarity  in  the  remainder  of  the  dis- 
tributions.   Charts  28M,  28N,  280  and  28P  illustrate  both  cumulatively 
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CHART  28  N 
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and  non-cumulatively  on  a  double  log  scale  two  wages  distributions  whose 
extreme  tails  appear  roughly  to  approximate  straight  lines  of  about  equal 
slope.1  Charts  28M  and  28N  are  from  data  concerning  wages  per  hour 
of  72,291  male  employees  in  the  slaughtering  and  meat-packing  industry 
in  1917; 2  Charts  280  and  28P  are  from  data  concerning  wages  per  hour 
of  180,096  male  employees  in  32  manufacturing  industries  in  the  United 
States  in  1900.3  A  mere  glance  at  the  two  non-cumulative  distributions 
will  bring  home  the  fact  that  while  they  show  considerable  similarity  in 
the  upper  income  range  tails,  they  are  quite  dissimilar  in  the  remainder 

'The  illustration  shows  only  "rough  similarity"  in  the  extreme  tails.  However,  there 
seems  no  good  reason  for  believing  that  even  great  similarity  in  the  tails  proves  similarity 
in  the  rest  of  the  distribution.  It  certainly  cannot  do  so  in  the  case  of  essentially  hetero- 
geneous distributions,  such  as  income  distributions. 

»  Bureau  of  Labor  Statistics,  Bulletin  No.  252. 

•Twelfth  Census  of  the  United  States  (1900),  Special  Report  on  Employees  and  Wages, 
Davis  R.  Dewey. 
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of  the  curves.  Moreover,  in  spite  of  this  similarity  of  tails,  the  slaughtering 
and  meat-packing  distribution  has  a  coefficient  of  variation  of  30.5  while 
the  manufacturing  distribution  has  a  coefficient  of  47.7.  In  other  words, 
the  relative  scatter  or  "inequality  of  distribution"  is  more  than  one-and-a- 
half  times  as  great  in  the  manufacturing  data  as  it  is  in  the  slaughtering 
and  meat-packing  data.  Furthermore,  no  discussion  and  explanation  of 
greater  essential  heterogeneity  in  the  one  distribution  than  in  the  other 
will  offset  the  fact  that  the  tails  are  similar  but  the  distributions  are  dif- 
ferent. There  seems  indeed  to  be  almost  no  correlation  between  the  slope 
of  the  upper-range  tail  and  the  degree  of  scatter  in  wages  distributions. 
Some  distributions  showing  extremely  great  scatter  have  very  steep  tails, 
some  have  not.1  The  frequency  curve  for  the  distribution  of  income  in 
Australia  in  1915  is  radically  different  from  either  the  curve  for  the  United 
States  in  1910  constructed  by  Mr.  W.  I.  King  or  the  curve  for  the  United 
States  in  1918  constructed  by  the  National  Bureau  of  Economic  Research. 

1  The  tails  of  wage  distributions  have  in  general  much  greater  slopes  than  those  of  the 
upper  (i.  e.,  income-tax)  range  rj  income  distributions.  This  is  an  outstanding  difference 
between  the  two  distributions  Pareto's  conclusions  with  respect  to  the  convex  appearance 
of  the  curve  for  wages  are  consistent  with  curves  showing  number  of  dollars  per  income-tax 
interval  traceable  to  wa^es  but  not  with  actual  wage  distributions  showing  number  of 
recipients  per  wage  intel  ^al.  Distributions  based  upon  income  from  effort  and  distributions 
based  upon  income  from  such  sources  (mostly  profits  and  income  from  property)  as  yield  the 
higher  incomes  seem  to  have  tails  the  one  as  roughly  straight  as  the  other.  Indeed  many 
wage  distributions  have  tails  more  closely  approximating  straight  lines  than  do  income-tax 
data. 
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Yet  all  three  curves  have  tails  on  a  double  log  scale  quite  as  similar  as  is 
common  with  income-tax  returns.1 

From  this  discussion  we  may  draw  the  corollary  that  it  is  futile  to  at- 
tempt to  measure  changes  in  the  inequality  of  distribution  of  income 
throughout  its  range  by  any  function  of  the  mere  tail  of  the  income  fre- 
quency distribution.  It  seems  unnecessary  therefore  to  discuss  Pareto's 
suggestions  on  this  subject. 

4.  Is  it  probable  that  the  distribution  of  income  is  similar  enough 
from  year  to  year  in  the  same  country  to  make  the  formulation 
of  any  useful  general  "law"  possible? 

1  As  will  be  seen  in  Chapter  29,  there  seems  reason  for  believing  that  the  extreme  difference 
between  the  distribution  of  incomes  obtained  by  the  Australian  Census  and  the  estimate 
made  by  the  National  Bureau  of  Economic  Research  is  due  largely  to  difference  in  definition 
of  income  and  income  recipient.  However,  this  does  not  alter  the  fact  that  we  have  here 
again  two  distributions  with  tails  as  similar  as  is  usual  with  income-tax  distributions  and 
lower  ranges  about  as  different  as  it  is  possible  to  imagine. 
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Before  answering  this  question  we  must  decide  what  we  should  mean 
by  the  word  similar.  If  income  distributions  for  two  years  in  the  same 
country  were  such  that  each  distribution  included  the  same  individ- 
uals and  each  individual's  income  was  twice  as  large  in  the  second  year 
as  it  had  been  in  the  first  year,  it  would  seem  reasonable  to  speak  of  the 
distributions  as  strictly  similar.  If  in  a  third  year  (because  of  a  doubling 
of  population  due  to  some  hypothetical  immigration)  the  number  of  per- 
sons receiving  each  specified  income  size  was  exactly  twice  what  it  was 
in  the  second  year,  it  would  still  seem  reasonable  to  speak  of  the  distribu- 
tions as  strictly  similar.  Tested  by  any  statistical  criterion  of  dispersion 
which  takes  account  of  relative  size  (such  as  the  coefficient  of  variation), 
the  dispersion  is  precisely  the  same  in  each  of  the  three  years.  Moreover 
the  three  distributions  mentioned  above  x  must  necessarily  have  identically 
the  same  shape  on  a  double  log  scale,  and  furthermore  any  two  distribu- 
tions which  have  identically  the  same  shape  on  a  double  log  scale  2  must 
necessarily  have  the  same  relative  dispersion  as  measured  by  such  indices 
as  the  coefficient  of  variation,  interquartile  range  divided  by  median,  etc. 
Approximation  to  identity  of  shape  on  a  double  log  scale  seems  then  a 
useful  concept  of  "similarity."    It  is  the  concept  implicit  in  Pareto's  work.3 

Now  we  have  already  found  considerable  evidence  that  income  dis- 
tributions are  not,  to  a  significant  degree,  similar  in  shape  on  a  double  log 
scale.  The  income-tax  tails  of  income  distributions  for  different  times  and 
places  neither  approximate  straight  lines  of  constant  slope  nor  approxi- 
mate one  another;  they  are  of  distinctly  different  shapes.  Moreover,  such 
tails  do  not  show  in  respect  of  their  numbers  of  income  recipients  and 

1  Or,  any  distributions  whose  equations  may  be  reduced  to  one  another  by  substituting 
kix  for  x  and  hiy  for  y. 

2  The  curve  may  be  thought  of  as  consisting  of  two  parts,  which  before  reduction  to  log- 
arithms, would  be  (1)  the  positive  income  section  and  (2)  the  negative  income  section  with 
positive  signs. 

3  While  approximate  identity  of  shape  on  a  natural  scale,  a  natural  x  and  log  y  scale,  or 
any  other  similar  criterion  would  constitute  a  "law,"  no  such  approximate  identity  of  shape 
on  such  scales  has  yet  been  discovered  and  it  seems  difficult  to  advance  any  very  cogent 
a  priori  reasons  for  expecting  it. 

In  this  connection  we  must  remember  that  had  we  the  exact  figures  for  the  entire  frequency 
curves  of  the  distribution  of  income  in  the  United  States  from  year  to  year,  if  moreover  we 
could  imagine  definitions  of  income  and  income  recipient  which  would  be  philosophically 
satisfactory  and  statistically  usable — and  if  further  we  managed  year  by  year  to  describe 
our  data  curves  adequately  by  generalized  mathematical  frequency  curves  of  more  or  less 
complicated  variety  we  should  not  necessarily  have  arrived  at  any  particularly  valuable  re- 
sults. Any  series  of  data  may  be  described  to  any  specified  degree  of  approximation  by  a 
power  series  of  the  type  y  =  A  -f  Bx  +  Cz2  -{-  Dx3  +  • : but  such  fit  is  purely  em- 
pirical and  absolutely  meaningless  except  as  an  illustration  of  MacLaurin's  theorem  in  the 
differential  calculus.  We  might  be  able  to  describe  each  year's  data  rather  well  by  one  of 
Karl  Pearson's  generalized  frequency  curves,  but  if  the  essential  characteristics  of  the  curve — 
skewness,  kurtosis,  etc.,  changed  radically  from  year  to  year,  description  of  the  data  by  such 
a  curve  might  well  give  no  clue  whatever  as  to  any  "law."  Not  only  might  the  years  be  dif- 
ferent but  the  fits  might  be  empirical.  Professor  Edgeworth  has  well  said  that  "a  close  fit 
of  a  curve  to  given  statistics  is  not,  per  se  and  apart  from  a  priori  reasons,  a  proof  that  the 
curve  in  question  is  the  form  proper  to  the  matter  in  hand.  The  curve  may  be  adapted  to  the 
phenomena  merely  as  the  empirically  justified  system  of  cycles  and  epicycles  to  the  planetary 
movements,  not  like  the  ellipse,  in  favor  of  which  there  is  the  Newtonian  demonstration,  as 
well  as  the  Keplerian  observations."    Journal  of  the  Royal  Statistical  Society,  vol.  59,  p.  533. 
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total  amounts  of  income  any  uniformity  of  relation  to  the  total  number 
of  income  recipients  and  total  amount  of  income  in  the  country,  even 
after  adjustments  have  been  made  for  variations  in  population  and  average 
income.1  Considerations  such  as  these,  reenforce  the  conclusion  which 
we  arrived  at  from  an  examination  of  wage  distributions,  namely,  that 
there  is  little  necessary  relation  between  the  shape  of  the  tail  and  the  shape 
of  the  body  of  a  frequency  distribution,  and  have  led  us  to  suspect  that, 
even  if  the  tails  of  income  distributions  were  practically  identical  in  shape, 
it  would  be  extremely  dangerous  to  conclude  therefore  that  the  lower 
income  ranges  of  the  curves  were  in  any  way  similar. 

A  most  important  matter  remains  to  be  discussed.  What  right  have 
we  to  assume  that  the  heterogeneity  necessarily  inherent  in  all  income 
distribution  data  is  not  such  as  inevitably  to  preclude  not  only  uniformity 
of  shape  of  the  frequency  curve  from  year  to  year  and  country  to  country 
but  also  the  very  possibility  of  rational  mathematical  description  of  any 
kind  unless  based  upon  parts  rather  than  the  whole?  What  evidence  have 
we  as  to  the  extent  and  nature  of  heterogeneity  in  income  distribution 
data? 

In  the  first  place  we  must  remember  that  lower  range  incomes  are  pre- 
dominantly from  wages  and  salaries,  while  upper  range  incomes  are  pre- 
dominantly from  rent,  interest,  dividends  and  profits.2  While  74.67  per 
cent  of  the  total  income  reported  in  the  United  States  in  the  $l,000-$2,000 
income  interval  in  1918  was  traceable  to  wages  and  salaries,  only  33.10 
per  cent  of  the  income  in  the  $10,000-$20,000  interval  was  from  those 
sources,  and  only  15.92  per  cent  of  the  income  in  the  $100,000-$150,000 
interval  and  3.27  per  cent  of  the  income  in  the  over-$500,000  intervals. 
On  the  other  hand,  while  only  1.93  per  cent  of  the  total  income  reported 
in  the  $l,000-$2,000  interval  in  1918  was  traceable  to  dividends,  23.73 
per  cent  was  so  traceable  in  the  $10,000-$20,000  interval,  43.18  per  cent 
in  the  $100,000-$150,000  interval,  and  59.44  per  cent  in  the  over-$500,000 
intervals.3    The  difference  in  constitution  of  the  income  at  the  upper  and 

1  Estimated  per  cent  of  total  income  received  by  highest  5%  of  income  receivers  in  United 
States: 

1913 33 

1914 32 

1915 32 

1916 34 

1917 29 

1918 26 

1919 24 

National  Bureau  of  Economic  Research,  Income  in  the  United  States,  vol.  1,  p.  116. 
»  Compare  Professor  A.  L.  Bowley's  paper  on  "The  British  Super-Tax  and  the  Distribution 
of  Income,"  Quarterly  Journal  of  Economics,  February,  1914. 
»  Statistics  of  Income  1918,  pp.  10  and  44. 

While  the  reporting  of  dividends  was  almost  certainly  less  complete  in  the  lower  than  in 
the  upper  income  classes,  the  difference  could  not  be  sufficient  to  invalidate  the  general  con- 
clusion. Lower  range  incomes  are  predominantly  wage  and  salary  incomes;  upper  range  in- 
comes are  not. 
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lower  ends  of  the  distribution  is  sufficient  to  justify  the  statement  that 
most  of  the  individuals  going  to  make  up  the  lower  income  range  of  the 
frequency  curve  are  wage  earners,  while  the  individuals  going  to  make  up 
the  upper  income  range  are  capitalists  and  entrepreneurs.1  What  do  we 
know  about  the  shapes  of  these  component  distributions?  Is  the  funda- 
mental difference  in  their  relative  positions  on  the  income  scale  their  only 
dissimilarity? 

In  any  particular  year  the  upper  income  tail  of  the  frequency  distribu- 
tion of  income  among  capitalists  and  entrepreneurs  seems  not  greatly  dif- 
ferent from  the  extreme  upper  income  tail  of  the  frequency  distribution 
of  income  among  all  classes.  This  is  what  we  might  expect.  Not  only  is 
the  percentage  of  the  total  income  in  the  extreme  upper  income  ranges 
reported  as  coming  from  wages  and  salaries  small  but  much  of  this  so- 
called  wages  and  salaries  income  must  be  merely  technical.  For  example, 
it  is  often  highly  " convenient"  to  pay  "salary"  rather  than  dividends. 
Furthermore,  in  so  far  as  the  tail  of  the  curve  of  distribution  of  income 
among  capitalists  and  entrepreneurs  is  not  identical  with  the  tail  of  the 
general  curve,  it  will  show  a  smaller  rather  than  a  larger  slope,  because  the 
percentage  of  the  number  of  persons  in  each  income  interval  who  are 
capitalists  and  entrepreneurs  increases  as  we  pass  from  lower  to  higher 
incomes.2  Now  the  slopes  of  the  straight  lines  fitted  to  the  extreme  tails 
of  non-cumulative  income  distributions  on  a  double  log  scale  fluctuate 
within  a  range  of  about  2.4  to  3.0. 

The  upper  range  tails  of  wages  distributions  tell  an  entirely  different 
story.  Aside  from  surface  irregularities  often  quite  evidently  traceable  to 
concentration  on  certain  round  numbers,  the  majority  of  wages  distribu- 
tions have  tails  which,  on  a  double  log  scale,  are  roughly  linear.3  How- 
ever the  slopes  of  straight  lines  fitted  to  these  tails  are  much  greater  than 
the  slopes  of  corresponding  straight  lines  fitted  to  income  distribution 
tails.4    While  the  slopes  of  income  distribution  tails  range  from  about  2.4 

1  Many  individuals  in  the  middle  income  ranges  must  necessarily  be  difficult  to  classify. 
This  does  not  mean  that  the  concept  of  heterogeneity  is  inapplicable.  There  are  countries 
in  which  the  population  is  a  mixture  of  Spanish,  American  Indian,  and  Negro  blood.  Now 
such  a  population  must,  for  many  statistical  purposes,  be  considered  extremely  heterogeneous 
even  though  the  percentage  of  the  population  which  is  of  any  pure  blood  be  quite  negligible. 

2  In  1917,  the  only  year  in  which  returns  are  classified  according  to  "principal  source  of 
income"  (wages  and  salaries,  income  from  business,  income  from  investment)  the  difference 
in  slope,  in  the  income  range  $100,000  to  $2,000,000,  between  the  distribution  for  all  returns 
and  the  distribution  for  those  returns  which  did  not  report  wages  and  salaries  as  their  prin- 
cipal source  of  income  was  less  than  .05.  The  slope  in  this  range  of  the  line  fitted  to  all  re- 
turns was  about  2.64;  the  business  and  investment  line  was  about  2.59  and  the  wages  line 
about  3.21.  In  1916,  the  only  year  in  which  returns  are  classified  according  to  occupations, 
the  distribution  of  income  among  capitalists  shows  a  slope  of  only  2.08  while  public  service 
employees  (civil)  show  a  slope  of  2.70  and  skilled  and  unskilled  laborers  a  slope  of  2.74. 

3  Attention  has  already  been  drawn  to  the  fact  that  this  is  a  characteristic  of  many  fre- 
quency distributions  of  various  kinds. 

4  A  furthee  difference  between  the  upper  range  income  distribution  among  capitalists  and 
entrepreneurs  and  the  upper  range  of  the  distribution  among  all  persons  seems  to  be,  from 
the  1916  occupation  distributions,  that  the  distribution  among  all  persons  shows  less  of  a  roll, 
i.  e.,  is  straighter. 
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to  3.0,  the  slopes  of  wages  distributions  tails  commonly  range  between 
4.0  and  6.0.  They  seldom  run  below  about  4.5;  they  sometimes  run  as 
high  as  10.0  and  11.0. 

A  distribution  of  wages  per  hour  for  26,183  male  employees  in  iron  and 
steel  mills  in  the  United  States  in  1900  !  shows  a  tail  with  a  slope  of  about 
3.35.  However,  the  total  of  which  this  is  a  part,  the  distribution  of  wages 
per  hour  among  180,096  male  employees  in  32  manufacturing  industries 
in  1900,  shows  a  tail-slope  of  about  4.8.  The  estimated  distribution  of 
weekly  earnings  of  5,470,321  wage  earners  in  the  United  States  in  1905  2 
shows  a  tail-slope  of  about  5.0.  The  distribution  of  earnings  per  hour 
among  318,946  male  employees  in  29  different  industries  in  the  United 
States  in  1919  3  shows  a  tail-slope  of  about  5.86.  The  distribution  of 
wages  per  month  among  1,939,399  railroad  employees  in  the  United  States 
in  1917  4  shows  a  tail-slope  of  about  6.25.  The  distribution  of  wages  per 
hour  among  43,343  male  employees  in  the  foundries  and  metal  working 
industry  of  the  United  States  in  1900  5  shows  a  tail-slope  of  about  7.8. 
The  distribution  of  earnings  in  a  week  among  9,633  male  employees  in  the 
woodworking  industry — agricultural  implements — in  the  United  States  in 
19006  shows  a  tail-slope  of  over  11.0.  At  the  other  extreme  was  the  case 
of  the  wages-per-hour  distribution  among  26,183  male  employees  in  Amer- 
ican iron  and  steel  mills  in  1900  with  a  slope  of  3.35.  Both  11.0  and  3.35 
are  exceptional,  but  the  available  data  make  it  clear  that  wages  distribu- 
tions of  either  earnings  or  rates  have  tail-slopes  which  are  always  much 
greater  than  the  maximum  tail-slope  of  income  distributions. 

The  illustrations  in  the  preceding  paragraph  are  illustrations  of  the  tail- 
slopes  of  wages  distributions  among  wage  earners.  However  all  the  evi- 
dence points  to  frequency  distributions  of  income  among  wage  earners 
having  tail-slopes  only  very  slightly  less  steep  than  the  tail-slopes  of  wages 
distributions.  We  have  almost  no  usable  data  concerning  the  relation 
between  individual  wage  distributions  and  income  distributions  for  the 
same  individuals,  but  we  have  a  few  samples  showing  the  relation  between 
family  earnings  distributions  and  family  income  distributions.7  More- 
over, we  can  without  great  risk  base  certain  extremely  general  conclusions 

•Twelfth  Census  of  the  United  States  (1900),  Special  Report  on  Employees  and  Wages, 
Davis  R.  Dewey. 

2 1905  Census  of  Manufacturers,  Part  IV,  p.  647. 

•  Monthly  Labor  Review,  Sept.,  1919. 

1  Report  of  the  Railroad  Wage  Commission  to  the  Director  General  of  Railroads,  1919,  p.  96. 

'Twelfth  Census  of  the  United  States  (1900),  Special  Report  on  Employees  and  Wages, 
Davis  R.  Dewey. 

•Twelfth  Census  of  the  United  States  (1900),  Special  Report  on  Employees  and  Wages, 
Davis  R.  Dewey. 

7  The  reader  must  not  confuse  the  percentage  of  the  income  not  derived  from  wages  going 
to  wage-earners  in  any  particular  income  class  with  the  percentage  of  the  income  not  derived 
from  wages  going  to  all  income  recipients  in  any  particular  income  class.  Soirtte  of  these  last 
recipients  are  not  wage  earners  at  all,  they  receive  no  wages.  Information  concerning  the 
second  of  these  relations  but  not  the  first  is  given  in  the  income  tax  reports. 
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concerning  individual  wage-earners'  income  distributions  on  these  family- 
data.  The  upper  tails  of  the  family-wage  distributions  are  the  tails  of  the 
wage  distributions  for  the  individuals  who  are  the  heads  of  the  families. 
This  is  apparent  from  an  analysis  of  the  samples.  Now  income  from  rents 
and  investments  belongs  almost  totally  to  heads  of  families.  Such  income 
is  however  so  small  in  amount  that  it  cannot  alter  appreciably  the  slope 
of  the  tail. 1  While  income  from  other  sources  than  rents  and  investments 
(lodgers,  garden  and  poultry,  gifts  and  miscellaneous)  may  not  be  so  con- 
fidently placed  to  the  credit  of  the  head  of  the  family,  this  item  changes 
its  percentage  relation  to  the  total  income  so  slowly  as  to  be  negligible  in 
its  effect  upon  the  tail-slope  of  the  distribution.2  Notwithstanding  the 
danger  of  reasoning  too  assuredly  about  individuals  from  these  picked 
family  distributions,  we  seem  justified  in  believing  that  the  tail-slopes  of 
income  distributions  among  individual  wage  earners  are  not  very  different 
from  the  tail-slopes  of  wage  distributions  among  the  same  individuals.3 
The  upper  tail-slopes  of  income  distributions  among  typical  wage  earners 

1  For  example,  in  the  report  on  the  incomes  of  12,096  white  families  published  in  the  Monthly 
Labor  Review  for  December,  1919,  we  find  the  income  from  rents  and  investments  less  than 
one  per  cent  of  the  total  family  income  for  each  of  the  income  intervals. 

Percentage  income  from 
Income  group  rents  and  investments 

is  of  total  income 
Under   $900  .079 

$    900-$l,200  .176 

1,200-  1,500  .410 

1,500-  1,800  .551 

1,800-  2,100  .606 

2,100-  2,500  .998 

2,500  and  over  .778 

2  As  a  somewhat  extreme  example,  the  Bureau  of  Labor  investigation  mentioned  in  the 
preceding  note  shows  the  following  relations  between  total  family  earnings  and  total  family 
income  (including  income  from  rents  and  investments,  lodgers,  garden  and  poultry,  gifts  and 
miscellaneous). 

Income  group  Percentage  that  total 

earnings  are  of  total  income 

Under   $900  96.2 

$    900-$l,200  96.5 

1,200-  1,500  96.3 

1,500-  1,800  96.0 

1,800-  2,100  96.3 

2,100-  2,500  95.1 

2,500  and  over  96.2 

3  Further  corroboratory  evidence,  of  some  slight  importance,  that  the  tail-slopes  of  wage 
distributions  among  wage  earners  are  not  very  different  from  the  tail-slopes  of  income  dis- 
tributions among  wage  earners  is  yielded  by  the  fact  that  the  tail-slopes  of  income  distribu- 
tions among  families  (which  are  virtually  identical  with  the  tail-slopes  of  both  income  and 
wage  distributions  among  the  heads  of  these  families)  have  roughly  the  same  range  as  the 
tail-slopes  of  wage  distributions  among  individuals.  The  British  investigation  into  the  in- 
comes of  7,616  workingmen's  families  in  the  United  States  in  1909  shows  a  tail-slope  of  about 
3.5.  (Report  of  the  British  Board  of  Trade  on  Cost  of  Living  in  American  Towns,  1911.  [Cd. 
5609],  p.  XLIV.)  The  Bureau  of  Labor's  investigation  into  the  income  of  12,096  white  fam- 
ilies in  1919  shows  a  tail-slope  of  about  4.0.  Mr.  Arthur  T.  Emery's  extremely  careful  in- 
vestigation into  the  incomes  of  2,000  Chicago  households  in  1918  shows  a  tail-slope  of 
about  4.4.  At  the  other  extreme  we  find  that  the  Bureau  of  Labor's  investigation  into  the 
income  of  11,156  families  in  1903  (Eighteenth  Annual  Report  of  the  Commissioner  of  Labor, 
1903,  p.  558)  shows  a  tail-slope  of  about  10.0,  and  that  Mr.  R.  C.  Chapin's  investigation  into 
the  income  of  391  workingmen's  families  in  New  York  City  (Standard  of  Living  Among  Work- 
ingmen's Families  in  New  York  City,  p.  44)  also  shows  a  slope  of  about  10.0.  The  tails  of 
these  last  two  cases  are  very  irregular  so  that  the  slope  itself  is  not  determinable  with  much 
precision. 
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may  then  be  assumed  to  have  much  greater  slopes  than  the  upper  tail- 
slopes  of  income  distributions  among  capitalists  and  entrepreneurs.  It 
does  not  seem  possible  to  make  any  very  definite  statement  concerning 
the  body  and  lower  tail  of  the  capitalist  and  entrepreneurial  distribution — 
even  in  so  far  as  that  term  is  a  significant  one.1  All  the  evidence  suggests 
that  the  mode  of  what  we  have  termed  the  capitalist-entrepreneurial  dis- 
tribution is  consistently  higher  than  the  wage-earners'  mode.2  Its  lower 
income  tail  undoubtedly  reaches  out  into  the  negative  income  range,  which 
the  tail  of  the  wage-earners'  distribution  may,  both  a  priori  and  from  evi- 
dence, be  assumed  not  to  do.  It  seems  a  not  irrational  conclusion  then  to 
speak  of  the  capitalist-entrepreneurial  distribution  as  having  a  lesser  tail- 
slope  than  the  wage- earners'  distribution  on  the  lower  income  side  as  well 
as  on  the  upper  income  side,3  and  as  a  corollary  almost  certainly  a  much 
greater  dispersion  both  actual  and  relative  than  the  wage-earners'  dis- 
tribution. 

Though  the  above  generalizations  concerning  differences  between  the 
wage-earners'  income  distribution  and  the  capitalist-entrepreneurial  in- 
come distribution  seem  sound,  they  tell  but  a  fraction  of  the  story.  Aside 
from  the  difficulty  of  classifying  all  income  recipients  in  one  or  the  other 
of  these  two  classes,  we  are  faced  with  the  further  fact  that  investigation 
suggests  that  our  two  component  distributions  are  themselves  exceedingly 
heterogeneous.4  We  have  already  noted  that  wage  distributions  for  dif- 
ferent occupations  and  times  are  extremely  dissimilar  in  shape  and  we 
suspect  that  the  same  applies  to  capitalist-entrepreneurial  distributions. 
For  example,  what  little  data  we  possess  suggest  that  the  distribution  of 
income  among  farmers  has  little  in  common  with  other  entrepreneurial 
distributions. 

Moreover,  the  component  distributions,  into  which  it  would  seem  nec- 
essary to  break  up  the  complete  income  distribution  before  any  rational 
description  would  be  possible,  not  only  have  different  shapes  and  different 
positions  on  the  income  scale  (i.  e.,  different  modes,  arithmetic  averages, 
etc.),  but  the  relative  position  with  respect  to  one  another  on  the  income  scale 
of  these  different  component  distributions  changes  from  year  to  year.5 

1  In  the  total  income  curve  there  is  a  broad  twilight  zone  where  individuals  are  often  both 
wage  or  salary  earners  and  capitalists  or  even  entrepreneurs. 

2  In  the  1916  occupation  distributions  the  only  occupations  showing  more  returns  for  the 
$4,000-$5,000  interval  than  the  $3,0O0-$4,000  (that  is  the  only  occupations  showing  any 
suggestion  of  a  mode)  are  of  a  capitalistic  or  entrepreneurial  description — bankers;  stock- 
brokers; insurance  brokers;  other  brokers;  hotel  proprietors  and  restaurateurs;  manufacturers; 
merchants;  storekeepers;  jobbers;  commission  merchants,  etc.;  mine  owners  and  mine  op- 
erators; saloon  keepers;  sportsmen  and  turfmen. 

» Of  course  the  very  word  slope  is  an  ambiguous  term  to  use  concerning  the  tail  of  a  curve 
which  enters  the  second  quadrant. 

*  Evidence  suggesting  definite  heterogeneity  in  the  "wage  and  salary"  figures  of  the  income- 
tax  returns  is  presented  in  Chapter  30. 

•  This  fact  is  one  of  the  simpler  pieces  of  evidence  against  the  existence  of  a  "law."  Of 
course,  even  though  the  income  distribution  were  made  up  of  heterogeneous  material,  if  the 
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Table  28Q  l  is  interesting  as  showing  the  changes  in  the  relative  positions 
of  the  arithmetic  averages  of  different  wage  distributions  in  1909,  1913 
and  1918. 

TABLE  280 

CHANGES  IN  THE  RELATIVE  POSITIONS  OF  THE  AVERAGE  ANNUAL 
EARNINGS  OF  EMPLOYEES  ENGAGED  IN  VARIOUS  INDUSTRIES 


Industry 

All  Industries 

Agriculture 

Production  of  Minerals 

Manufacturing: 

Factories 

Hand  Trades 

All  Transportation 

Railway,  Express,  Pullman,  Switching  and 
Terminal  Cos 

Street  Railway,  Electric  Light  and  Power, 
Telegraph  and  Telephone  Cos 

Transportation  by  Water 

Banking 

Government 

Unclassified  Industries 


1909 


1913 


1918 


100.0 

48.2 
95.7 

91.2 
111.7 
104.9 

104.0 

99.5 
123.5 
123.0 
118.1 
114.4 


100.0 

45.4 

104.4 

97.5 
103.5 
105.4 

108.2 

93.8 
114.1 
128.6 
113.8 
107.7 


100.0 

54.7 

119.0 

105.5 
110.8 
119.3 

129.3 

81.4 

147.5 

135.5 

83.0 

97.8 


The  data  are  so  inadequate  that  the  construction  of  a  similar  table  for 
capitalist-entrepreneurial  distributions  is  not  feasible.  However,  there  are 
comparatively  good  figures  for  total  income  of  farmers  and  total  number 
of  farmers  year  by  year.2  The  average  incomes  of  farmers,  year  by  year, 
were  the  following  percentages  of  the  estimated  average  incomes  of  all 
persons  gainfully  employed  in  the  country. 

Percentages 

1910  75.19 

1911  69.13 

1912  72.41 

1913  74.88 

1914  76.33 

1915  80.45 

1916  82.85 

1917  104.51  ♦ 

1918  109.68 

1919  103.95 

1920  63.88 

This  is  a  wide  range. 

Exactly  what  effects  have  such  internal  movements  of  the  component 
distributions  upon  the  total  income  frequency  distribution  curve?  This 
is  a  difficult  question  to  answer  as  we  have  not  sufficient  data  to  break 

component  parts  remained  constant  in  shape  and  in  their  relative  positions  with  respect  to  one 
another  on  the  income  scale,  these  relations  would  of  themselves  constitute  a  s'law" 

1  Based  upon  Income  in  the  United  States,  Vol.  I,  pp.  102  and  103. 

2  See  Income  in  the  United  States,  Vol.  I,  p.  112. 
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down  the  total,  composite,  curve  into  its  component  parts  with  any  de- 
gree of  confidence.1  However,  the  movements  of  wages  in  recent  years 
would  appear  to  give  us  a  clue  to  the  sort  of  phenomena  we  might  expect 
to  find  if  we  had  complete  and  adequate  data. 

The  slopes  of  the  upper  income  tails  of  wages  distributions  are  great, 
4  to  5  or  more.2  Now  the  wage  curve  moved  up  strongly  from  1917  to 
1918  if  we  may  judge  by  averages.  The  average  wage  of  all  wage  earners 
in  the  United  States  3  increased  15.6  per  cent 4  from  1917  to  1918.  During 
the  same  period  the  average  income  of  farmers  increased  19. 1  per  cent 5 
and  the  average  income  of  persons  other  than  wage  earners  and  farmers 
remained  nearly  constant.  Total  amounts  of  income  by  sources  in  millions 
of  dollars  were: 


1917 


1918 


Percentage  1918 
was  of  1917 


Total  Wages  a 

Total  Farmers'  Income . 
All  other  Income 


$27,795 

8,800 

17,265 


$32,575 
10,500 
17,291 


117.20 
119.32 
100.15 


Total  Income. 


$53,860 


$60,366 


112.08 


a  Includes  pensions,  etc.,  and  includes  soldiers,  sailors,  and  marines. 

Stockholders  in  corporations  saw  income  from  that  source  actually  decline 
from  1917  to  19 18.6  What  happened  to  American  income-tax  returns 
during  this  time? 

1  The  processes  by  which  the  income  distribution  curve  published  in  Income  in  the  United 
States,  Vol.  I,  pp.  132-135  was  arrived  at  were  such  that  to  use  that  material  here  would 
practically  amount  to  circular  reasoning.  ■  The  conclusions  arrived  at  here  were  used  in  build- 
ing up  that  curve. 

1  The  slope  of  the  tail  of  the  wage  and  salary  curve  in  the  1917  income  tax  returns  is  only 
about  3.21  (compare,  note  2,  p.  377).  However  we  must  remember  that  the  individuals  there 
classified  are  largely  of  an  entirely  different  type  of  "wage-earner"  from  those  in  the  lower 
groups.  In  this  upper  group  occur  the  salaried  entrepreneurs,  professional  men,  etc.,  and 
those  whose  "salaries"  are  really  profits  or  dividends.  The  evidence  points  to  a  rather  dis- 
tinct and  significant  heterogeneity  along  this  division  in  the  wage  and  salary  distribution. 
See  Chapter  30.  i  .,„.«,., 

» Excluding  soldiers,  sailors,  and  marines,  and  professional  classes  but  including  officials 
and  "salaried  entrepreneurs." 

«  From  $945  per  annum  in  1917  to  $1,092  per  annum  in  1918. 

*  From  $1,370  per  annum  in  1917  to  $1,632  per  annum  in  1918. 


•  CORPORATION  DIVIDENDS,  SURPLUS  AND  EARNINGS 

(In  millions  of  dollars) 


Dividends 

Surplus 

Net  earnings 

1917. .                

3,995 
2,568 

3,963 
1,945 

7,958 

4,513 

See  page  324. 
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TOTAL  AMOUNT  OF  NET  INCOME  RETURNED  BY  SOURCES  (RETURNS 
REPORTING  OVER  $2,000  PER  ANNUM  NET  INCOME)  a 


(Millions  of  dollars) 


Income  class 


Wages  and  salaries 


All  other  sources  b 


1917 

1918 

1917 

1918 

$3,648 

$6,493 

$7,543 

$7,198 

1,553 

3,687 

1,799 

2,036 

301 

703 

528 

736 

661 

849 

1,167 

1,296 

1,133 

1,254 

4,049 

3,130 

Over  $2,000 .  . 

2,000-  4,000. 

4,000-  5,000. 

5,000-10,000. 
Over      10,000. 


°  Wages  income  from  returns  reporting  between  $1,000  and  $2,000  per  annum  is  not  avail- 
able for  1917. 

b  "Other  sources"  are  total  net  income  minus  wages  and  salaries,  i.  e.,  total  general  deduc- 
tions have  been  assumed  as  deductible  from  other  sources  (gross).  All  things  considered, 
this  seems  proper  here  though  it  may  easily  be  criticised.  In  connection  with  changes  in  the 
relation  between  net  and  gross  income  from  1917  to  1918  see  Chapter  30,  pp.  401  and  402. 

While  reported  income  from  all  other  sources  than  wages  and  salaries 
declined  4.6  per  cent,1  reported  income  from  wages  and  salaries  increased 
78.0  per  cent.2  Moreover,  the  great  increases  in  wages  and  salaries  were 
in  the  lowest  intervals.  The  wage  curve  with  its  steep  tail-slope  was 
moving  over  into  the  income  tax  ranges.3  The  effect  upon  the  total  curve 
is  very  pronounced,  as  may  be  seen  from  Table  28R. 

TABLE  28R 


AMERICAN  INCOME  TAX  RETURNS  IN  1917  AND  1918 


Total  Number  of  Returns 
(In  thousands) 


1917 


1918 


Percentage  1918 
was  of  1917 


$2,000-$4,000 
4,000-  5,000 
5,000-10,000 

Over     10,000, 


1,214 
186 
271 
162 


2,107 
322 
319 
160 


173.56 
173.12 
117.71 

98.77 


On  a  double  log  scale  we  see  the  curve  changing  its  shape  radically.  While 
the  1917  curve  is  comparatively  smooth  and  regular,  the  1918  curve 
develops  a  distinct  "bulge"  in  the  lower  ranges.4 

The  preceding  discussion  has  been  concerned  with  equal  dollar-income 

1  Had  "other  sources"  been  taken  gross  instead  of  net,  that  item  would  have  shown  an 
increase  of  5.3  per  cent  instead  of  a  decrease  of  4.6  per  cent. 

2  The  actual  spread  is  still  greater  than  the  figures  show.  Income  from  professions,  which 
in  1917  was  classed  under  wages,  in  1918  and  1919  was  classed  under  business. 

3  This  seems  to  be  a  fact  though  it  is  not  the  whole  story.  The  "intensive  drive"  of  1919 
may  easily  account  for  some  of  the  increase.  See  Chapter  30  for  a  discussion  of  the  probable 
extent  of  this  influence. 

*  See  Income  in  the  United  States,  Vol.  I,  Charts  28  and  30. 
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intervals.  However,  $2,000  income  in  1918  was  relatively  less  than  $2,000 
income  in  1917.  The  average  (per  capita)  income  of  the  country  was 
$523  in  1917  and  $586  in  1918.1  The  adjustment  is  theoretically  crude, 
but  $2,241 2  in  1918  might  be  considered  as  in  one  sense  equivalent  to 
$2,000  in  1917.  The  results  of  comparisons  of  the  two  years  upon  this 
basis  are  given  in  Table  28S.3 

TABLE  28S 


INCOME  RETURNED— BY  SOURCES 

(Millions  of  dollars) 
1917 


Income  class 

Wages  and 
salaries 

Total  net 
income 

Total  net 

income 

minus 

wages  and 

salaries 

Total  gross 
income 

Total  gross 

income 

minus 

wages  and 

salaries 

$2,000-$4,000 

4,000-  5,000 

5,000-10,000 

Over     10,000 

$1,553 

301 

661 

1,133 

$3,352 

829 
1,828 
5,182 

$1,799 

528 

1,167 

4,049 

$3,713 

895 

1,951 

5,518 

$2,161 

594 

1,290 

4,384 

1918 


$2,241-$4,482 

$3,236 

$5,359 

$2,123 

$5,766 

$2,530 

4,482-  5,602.  ... 

498 

1,111 

613 

1,247 

749 

5,602-11,205 

773 

1,960 

1,187 

2,315 

1,542 

Over    11,205 

1,153 

4,129 

2,976 

4,842 

3,689 

(Multiplied  by  — ,  that  is  reduced  to  "  1917  dollars") 
586 


$2,241-$4,482.  ... 

$2,888 

$4,783 

$1,895 

$5,146 

$2,258 

4.482-  5,602. . .  . 

445 

992 

547 

1,113 

668 

5,602-11,205.  ... 

690 

1,749 

1,059 

2,066 

1,376 

Over     11,205.... 

1,029 

3,685 

2,656 

4,321 

3,292 

(Percentages  of  Total  Income  of  Country) 
IM7 


$2,000-$4,000 

2.88 

6.22 

3.34 

6.89 

4.01 

4,000-  5,000.  ... 

.56 

1.54 

.98 

1.66 

1.10 

5,000-10,000 

1.23 

3.39 

2.16 

3.62 

2.39 

Over     10,00 

2.10 

9.61 

7.51 

10.24 

8.14 

1918 


$2,241-$4,482 

5.30 

8.78 

3.48 

9.45 

4.15 

4,482-  5,602.  ... 

.82 

1.82 

1.00 

2.05 

1.23 

5,602-11,205.  ... 

1.27 

3.21 

1.94 

3.80 

2.53 

Over     11,205... 

1.89 

6.77 

4.88 

7.94 

6.05 

1  Income  in  the  United  States,  Vol.  I,  p.  76. 

*  $2,000  X  H|- 

*  The  figures  for  the  amounts  of  income  in  the  irregular  1918  income  intervals  of  that  table 
($2,241-$4,482,  etc.)  were  calculated  by  straight  line  interpolation  on  a  double  log  scale  ap- 
plied to  the  even  thousand  dollar  intervals  of  the  income-tax  returns.  Though  the  total 
income  curve  does  not  approximate  linearity  it  may  be  assumed  linear  within  the  small 
range  of  one  income  tax  interval  without  serious  error. 


PARETO'S  LAW 


385 


(Table  28S  concluded.) 


NUMBER  OF  RETURNS 

(Thousands) 


Income  class 

1917 

Income  class 

1918 

Percentage  1918 
was  of  1917 

$2,000-$4,000 

4,000-  5,000 

5,000-10,000 

Over     10,000 

1,214 
186 
271 
162 

$2,241-84,482 

4,482-  5,602 

5,602-11,205 

Over     11,205 

1,758 
220 
260 
136 

144.81 

118.28 

95.94 

83.95 

It  is  from  this  table  once  again  apparent  that  the  wage  distribution  moved 
independently  up  on  the  income  scale  and  that  the  effect  of  this  movement 
was  confined  to  the  lowest  income  intervals.  Charts  28T,  28U,  28V,  28W, 
28X,  28Y,  28Z,  and  28AA  which  show  the  number  of  dollars  income  per 
dollar-income  interval,  by  sources,  are  enlightening  as  illustrating  in  still 


CHART  28  T 


100,000 


U.S.  INCOME  TAX  RETURNS 
1916 

NUMBER  OF  DOLLARS  !N  EACH  INCOME 
INTERVAL  BY  SOURCES 

Scales  Logarithmic. 
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CHART  28  tf 


U.  S.  INCOME  TAX  RETURNS 

1916 

NUMBER  OF  DOLLARS  IN  EACH 

INCOME  INTERVAL  BY  SOURCES. 

Scales  Logarithmic. 

A  INCOME  OTHER  THAN  WAGES 

OR  BUSINESS. 
5.  RENTS. 


INCOME  IN  THOUSANDS  OF  DOLLARS 
20         30     40    50  100  200 


400  500 
I       I 


2.00(1 


PARETO'S  LAW 


387 


CHART  28V 


U.S.  INCOME  TAX  RETURNS. 
1317 

NUMBER  OF  DOLLARS  IN  EACH 
INCOME  INTERVAL  BY  SOURCES. 

Scales  Logarithmic. 

I.  TOTAL  INCOME. 
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4.  OTHER  INCOME. 
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CHART  28W 


U  5    INCOME  TAX  RETURNS 

1917 

NUMBER  OF  DOLLARS  IN  EACH 

INCOME  INTERVAL  BY  SOURCES 

Scales  Logarithmic. 

4  INCOME  OTHER  THAN  WAGES  OR  BUSINESS. 

5  RENTS 

6  INTEREST 
7.  DIVIDENDS 
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U.  3.  INCOME  TAX  RETURNS 
1316 

NUMBER  OF  DOLLARS  IN  EACH 
INCOME  INTERVAL  BY  SOURCES 

.Scales  Logarithmic. 
4.  INCOME  OTHER  THAN  WAGES 
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U.  S.  INCOME  TAX  RETURNS 

1913 

NUMBER  OF  DOLLARS  IN  EACH 

INCOME  INTERVAL  EST  SOURCES 

Scales  Logarithmic 

1.  TOTAL  INCOME. 

2.  WAGES. 

3.  BUSINESS. 

4.  OTHER  INCOME. 
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U  a  INCOME  TAX  RETURNS 

1919 

NUMBER  OK  DOLLARS  IN  EACH 

INCOME  INTERVAL  BY  SOURCES 

Scales  Logarithmic 
A   INCOME  OTHER  THAN  WA6E5 
OR  BUSINESS 

5.  RENTS. 

6.  INTEREST. 
7   DIVIDENDS. 


\       f     \ 
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greater  detail  the  changes  in  the  constitution  of  the  returns  from  year  to 
year. 

Such  material  and  the  appearance  of  the  "bulge"  on  the  income-tax 
curve  in  the  lowest  income  ranges  1  in  the  years  1918  and  1919  when  wages 
and  salaries  were  high  and  average  (per  capita)  incomes  also  high  2  strongly 
suggest  that  the  income  curve,  in  so  far  as  it  shows  any  similarity  from 
year  to  year,  changes  its  general  appearance  and  turns  up  (on  a  double 
log  scale)  as  it  approaches  those  ranges  where  wages  and  salaries  are  of 
predominant  influence.3  The  great  slopes  of  wage  distributions  are  on 
this  hypothesis  not  inconsistent  with  the  smaller  slope  of  the  general 
income  curve  in  its  higher  (income-tax)  ranges.4 

Conclusions: 

(1)  Pareto's  Law  is  quite  inadequate  as  a  mathematical  generalization, 
for  the  following  reasons: 

(a)  The  tails  of  the  distributions  on  a  double  log  scale  are  not, 
in  a  significant  degree,  linear; 

(b)  They  could  be  much  more  nearly  linear  than  they  are  without 
that  condition  being  especially  significant,  as  so  many  dis- 
tributions of  various  kinds  have  tails  roughly  approaching 
linearity; 

(c)  The  straight  lines  fitted  to  the  tails  do  not  show  even  approxi- 
mately constant  slopes  from  year  to  year  or  between  country 
and  country; 

(d)  The  tails  are  not  only  not  straight  lines  of  constant  slope  but 
are  not  of  the  same  shape  from  year  to  year  or  between 
country  and  country. 

(2)  It  seems  unlikely  that  any  useful  mathematical  law  describing  the 
entire  distribution  can  ever  be  formulated,  because: 

(a)  Changes  in  the  shape  of  the  income  curve  from  year  to  year 
seem  traceable  in  considerable  measure  to  the  evident  hetero- 
geneity of  the  data; 

(b)  Because  of  such  heterogeneity  it  seems  useless  to  attempt  to 

1  See  Chapter  30  for  further  discussion  of  this  "bulge"  in  connection  with  an  examination 
of  how  far  it  may  be  the  result  of  irregularity  in  reporting. 

2  Average  (per  capita)  incomes  being  high  means  that  a  definite  money  income  (such  as 
$2,000)  takes  us  relatively  further  down  the  income  curve  than  if  average  incomes  were  low. 

3  It  is  difficult  to  say  just  where  the  "bulge"  might  have  appeared  in  the  1917  distribution 
if  as  great  efforts  had  been  made  to  obtain  correct  returns  in  that  year  as  were  made  under 
the  "intensive  drive"  for  1918  returns.  The  wages  line  on  the  1917  number  of  dollars  income 
per  dollar-income  interval  chart  (Chart  28V)  shows  signs  of  turning  up  somewhere  between 
$4,000  and  $5,000  and  the  business  line  somewhere  in  the  $5,000-$10,000  interval.  However 
neither  movement  is  large  nor  can  their  positions  be  accurately  determined  on  account  of  the 
size  of  the  reporting  intervals.    See  also  Chapter  30,  p.  412. 

4  The  "  bulge  "  on  the  income  from  wages  and  salaries  curve  itself,  as  seen  in  the  income- 
tax  returns  for  1918  and  1919  (see  Charts  28X  and28Z),  seems  the  result  of  heterogeneity  in 
these  wage  and  salary  data  themselves.     This  hypothesis  is  considered  in  Chapter  30. 
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describe  the  whole  distribution  by  any  mathematical  curve 
designed  to  describe  homogeneous  distributions  (as  any  simple 
mathematical  expression  must  almost  necessarily  be  designed 
to  do) ; 

(c)  Furthermore,  the  existing  data  are  not  adequate  to  break  up 
the  income  curve  into  its  constituent  elements; 

(d)  If  the  data  were  complete  and  adequate  we  might  still  remain 
in  our  present  position  of  knowing  next  to  nothing  of  the 
nature  of  any  "laws"  describing  the  elements.1 

(3)  Pareto's  conclusion  that  economic  welfare  can  be  increased  only 
through  increased  production  is  based  upon  erroneous  premises. 
The  income  curve  is  not  constant  in  shape.  The  internal  movements 
of  its  elements  strongly  suggest  the  possibility  of  important  changes 
in  distribution.  The  radically  different  mortality  curves  for  Roman 
Egypt  and  modern  England,2  and  the  decrease  in  infant  mortality 
in  the  last  fifty  years  illustrate  well  what  may  happen  to  heteroge- 
neous distributions. 

The  next  four  chapters  review  the  data  from  which  any  income  frequency 
distribution  for  the  United  States  must  be  constructed. 

1  Though  all  the  evidence  points  to  hope  of  further  progress  lying  in  the  analysis  of  the 
parts  rather  than  in  any  direct  attack  upon  the  unbroken  heterogeneous  whole. 
« See  Biometrika,  Vol.  I,  pp.  261-264. 


CHAPTER  29 
OFFICIAL  INCOME  CENSUSES 

There  has  never  been  a  complete  income  census  of  the  American  people. 
The  Federal  income-tax  data  cannot  take  the  place  of  such  a  census.  Re- 
specting the  distribution  of  income  among  persons  having  incomes  of  less 
than  $1,000  Federal  income-tax  data  give  us  no  information  whatsoever." 
Furthermore,  on  account  of  the  exemption  of  married  persons,  compara- 
tively little  use  can  be  made  of  the  $1,000  to  $2,000  interval.  The  number 
of  persons  reporting  incomes  over  $2,000  in  our  best  year,  1918,  was  only 
7.3  per  cent  of  the  estimated  total  number  of  income-recipients  in  the 
country.  Moreover,  not  only  because  of  direct  evasion  and  illegal  non- 
reporting,  but  also  because  of  "legal  evasion"  and  the  large  amount  of 
tax-exempt  income  which  need  not  be  reported  at  all,  these  income-tax 
data  cannot  give  an  approximately  correct  picture  of  even  that  part  of 
the  frequency  curve  which  lies  above  $2,000.  The  adjustments  of  the 
income-tax  data  necessary  to  obtain  such  a  picture  are  extremely  large, 
as  we  shall  presently  see. 

Only  one  country  in  the  world  has  ever  taken  an  official  income  census 
which  made  any  pretense  of  completeness.  Under  the  War  Census  Act 
the  Commonwealth  of  Australia  took  an  official  income  census  of  incomes 
received  during  the  year  ended  June  30,  1915,  by  everyone,  man,  woman, 
or  child,  who  was  "possessed  of  property,  or  in  receipt  of  income."  *  The 
results  of  that  census  are  summarized  by  G.  H.  Knibbs,  the  Commonwealth 
Statistician,  in  The  Private  Wealth  of  Australia  and  its  Growth.  A  Re- 
port of  the  War  Census  of  1915.  (See  Table  29A  and  Charts  29A,  29B 
and  29C.) 

Now  while  it  would  naturally  be  impossible  to  construct  a  complete 
frequency  distribution  for  American  incomes  from  Australian  data,2  we 
might  perhaps  hope  to  discover  some  characteristics  of  income-distribution 

1  While  the  first  clause  of  the  Australian  "Wealth  and  Income  Card"  stated  merely  that 
it  was  "to  be  filled  in  by  all  persons  aged  18  or  upwards  possessed  of  property,  or  holding 
property  on  trust,  or  in  receipt  of  income,"  etc.  (p.  9),  "a  special  instruction  was  issued  that 
in  the  case  of  all  persons  under  the  age  of  18,  possessed  of  property,  or  in  receipt  of  income, 
a  return  must  be  furnished  by  the  parent  or  guardian  in  respect  of  such  property  or  income." 
(p.  10.)  The  income  from  such  trust  funds  was  not  all,  but  only  "in  the  main,"  allocated  to 
individual  beneficiaries,    (p.  22.) 

G.  H.  Knibbs,  The  Private  Wealth  of  Australia  and  its  Growth.  A  Report  of  the  War  Census 
of  1915. 

2  Aside  from  the  questionableness  of  such  a  procedure,  the  large  size  of  the  low  income 
intervals  in  the  Australian  distribution  and  the  lack  of  information  concerning  the  amount 
of  negative  income  make  that  distribution  a  difficult  one  to  work  with.  A  classification  by 
such  large  intervals  tells  very  little. 
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curves  in  general  from  this,  the  only  actual  census  ever  taken.  A  knowl- 
edge of  such  general  characteristics  might  then,  quite  imaginably,  be  a 
little  helpful  in  the  problem  of  describing  the  American  or  any  other 
income  distribution. 

However,  when  we  come  to  examine  the  Australian  figures,  we  find  that 
they  have  certain  pronounced  peculiarities  which  would  be  extremely  diffi- 
cult to  read  into  the  American  material.  For  example,  the  Australian  dis- 
tribution shows  a  flatness  and  lack  of  pronounced  mode  totally  unlike  the 
results  we  have  built  up  from  an  analysis  of  American  data.  In  the  Aus- 
tralian distribution  there  are  nearly  the  same  number  of  persons  having 
incomes  between  0  and  £50,  £50  and  £100,  and  £100  and  £150.1 

What  are  the  causes  of  this  rather  startling  peculiarity  of  the  Australian 
frequency  curve?  2  In  the  first  place  let  us  suggest  a  possibly  minor  but 
by  no  means  necessarily  negligible  factor.  We  know  little  about  the  good- 
ness of  the  Australian  reporting  in  this  census.  Income  is,  from  its  nature, 
a  difficult  subject  to  investigate.  When  the  material  is  collected  by  means 
of  schedules  to  be  filled  in  by  the  informants,  as  was  the  case  in  the  Aus- 
tralian census,  the  returns  may  easily  be  full  of  errors.  The  average  in- 
dividual is  surprisingly  ignorant  concerning  the  amount  of  his  total  income. 
The  further  fact  that  the  census  was  taken  in  order  to  estimate  possi- 
bilities of  future  taxation  may  well  have  been  a  powerful  incentive  towards 
great  irregularities  all  along  the  line,  but  especially  in  the  lower  income 
groups.  Persons  whose  income  brought  them  distinctly  into  the  upper 
groups  (over  £156)  were,  at  the  time  of  the  income  census,  about  to  make 
returns  under  oath  for  income-tax  purposes  and  would  hardly  care  to 
show  a  radical  discrepancy  between  the  two  returns.  On  the  other  hand, 
many  persons,  whose  true  incomes  were  around  £156  and  the  modal  income, 
might  easily  have  "underestimated"  with  the  idea  of  evading  if  possible 
future  taxation  based  upon  a  lowering  of  the  exemption  limit.  The  result 
of  such  practices  would  tend  to  show  up  graphically  in  a  flattening  of  the 
curve  in  the  vicinity  of  the  mode  of  the  distribution  and  a  raising  of  the 
numbers  in  the  lowest  groups.  3 

However,  poor  reporting  is  probably  only  a  secondary  element  ac- 
counting for  the  peculiarities  of  the  Australian  curve.    It  is  most  of  all  the 

•  See  Table  29A  and  Chart  29A. 

1  Notwithstanding  the  fact  that  distributions  for  different  times  and  for  different  countries 
probably  vary  greatly  (see  Chapter  28),  the  difference  between  the  Australian  curve  and 
the  Bureau's  American  estimate  seems  too  radical  to  explain  upon  this  basis. 

•  It  is  difficult  to  determine  the  extent  of  actual  non-reporting.  The  number  of  males 
filling  out  income  cards  was  2,527,831.  All  males  "possessed  of  property,  or  in  receipt  of 
income"  are  supposed  to  be  included  in  this  number.  It  amounted,  however,  to  only  54.60 
per  cent  of  the  total  male  population.  Males  "possessed  of  property,  or  in  receipt  of  income" 
necessarily  constitute  a  larger  percentage  of  the  total  male  population  than  do  male  "bread- 
winners," yet  in  the  Australian  census  of  1911  male  breadwinners  constituted  69.4  per  cent 
of  the  total  male  population,  and  male  breadwinners  20  years  of  age  or  older  58.9  per  cent. 
Even  if  we  assume  that  the  number  of  income  returns  for  males  under  18  was  negligible  we 
still  are  faced  with  a  discrepancy  difficult  to  account  for. 
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concentration  of  female  returns  in  the  lowest  income  groups  which  gives 
the  flat  and  modeless  appearance  to  the  total  curve.  The  Australian  fre- 
quency distribution  among  males  only,  is  much  more  like  our  estimated 
American  distribution  !  than  is  the  Australian  distribution  among  males 
and  females  together.  Now  the  concentration  of  female  returns  in  the 
lower  income  intervals  would  seem  to  be  the  result  of  a  large  number  of 
returns  made  by  women  and  female  children  receiving  petty  incomes  from 
property  who  would  be  classified,  in  the  Australian  Census  of  Population, 
as  "dependents"  and  not  as  "breadwinners."  2 

Of  the  total  female  population  in  1915,  33.46  per  cent  made  out  income 
cards  and  23.18  per  cent  reported  positive  incomes  (10.28  per  cent  re- 
ported zero  or  negative  incomes) .  But  according  to  the  Australian  census 
of  1911,  only  18.6  per  cent  of  the  total  female  population  were  classified 
as  "breadwinners."  Thus  the  women  reporting  positive  incomes  in  1915 
constituted  a  much  larger  percentage  of  the  total  female  population  than 
did  female  "breadwinners"  in  1911  of  the  total  female  population  in  that 
year.  The  discrepancy  seems  too  great  to  be  accounted  for  by  the  in- 
crease in  the  number  of  women  "breadwinners"  caused  by  the  war.  More 
than  half  of  the  23.18  per  cent  of  the  female  population  reporting  positive 
incomes  in  1915  reported  incomes  under  £50  per  annum.  Moreover,  the 
average  income  of  this  group  was  only  £22  per  annum — under  the  arith- 
metic average  of  the  interval.  This  strongly  suggests  petty  incomes  from 
property,  and  part  time  occupations  such  as  keeping  boarders,  lodgers, 
chickens,  etc.,  rather  than  any  great  increase  in  the  number  of  female 
"breadwinners."  The  fact  that  over  30  per  cent  of  the  returns  made  by 
females  reported  zero  or  negative  incomes  is  further  evidence  that  the 
large  number  of  extremely  small  incomes  reported  was  largely  the  result 
of  the  schedule  calling  for  income  returns  from  all  persons  "possessed  of 
property." 

Negative  incomes  arise  in  general  from  business  or  speculative  losses. 
Bad  as  may  be  the  condition  of  any  laboring  class,  its  members  are  seldom 
faced  with  negative  incomes.  It  is  unlikely  that  many  of  the  249,476 
females  reporting  "deficit  and  nil"  were  wage-earners.  They  were  in 
general  the  owners  of  small  investments  which  showed  losses,  such  as 
town  lots  upon  which  taxes  had  been  paid.3 

1  See  Income  in  the  United  States,  Vol.  I,  pp.  128,  129,  132-135. 

2  All  persons  are  classified  as  "breadwinners"  or  as  "dependents"  by  the  Australian  census. 
Male  "breadwinners"  in  Australia  constituted  in  1911,  according  to  the  census  of  that  year, 
69.4  per  cent  of  the  total  male  population,  female  "breadwinners"  18.6  per  cent  of  the  total 
female  population,  and  total  "breadwinners"  45.0  per  cent  of  the  total  population.  These 
figures  compare  with  American  census  figures  for  1910  showing  males  "gainfully  employed" 
to  constitute  63.6  per  cent  of  total  males,  females  "gainfully  employed"  18.1  per  cent  of 
total  females,  and  total  "gainfully  employed"  41.5  per  cent  of  the  total  population. 

3  It  is  worth  noting  that  in  the  Australian  schedule  "rates  and  taxes  paid"  could  be  de- 
ducted before  making  an  income  return.  This  consideration  may  be  of  some  importance  in 
explaining  the  very  large  number  of  small,  zero,  and  negative  incomes. 
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While  the  frequency  curve  for  Australian  males  is  much  more  like  the 
American  distribution  than  the  curve  representing  both  male  and  female 
Australian  income  recipients,  even  it  shows  a  much  greater  concentration 
in  the  lowest  income  intervals  than  does  the  American  distribution.  This 
can  probably  be  accounted  for  to  some  extent  by  a  large  number  of  income 
returns  for  young  male  "dependents"  "possessed  of  property." 

The  essential  difference  in  appearance  between  the  American  income- 
distribution  curve  which  we  presented  in  Volume  I  and  the  Australian 
curve  of  1915  is,  then,  probably  traceable  to  (1)  Australian  underreporting 
and  (2)  Australian  inclusion  of  a  large  number  of  "dependents"  who  re- 
ceived petty  incomes  from  property  and  who  were  in  no  important  sense 
"breadwinners"  or  "gainfully  employed." 

What  shall  we  say  about  the  desirability  or  undesirability  of  including 
in  an  income  frequency  distribution  dependents  receiving  petty  incomes 
from  property?  While  it  is  true  that  their  incomes,  positive  or  negative, 
are  in  a  way  as  real  as  any  other  incomes,  we  must  remember  that  probably 
almost  all  individuals  over  six  years  of  age  not  only  receive  but  earn  some 
money  income  during  each  year.  Shall  we  then  include  the  entire  popu- 
lation over  six  years  old  in  our  distribution?  As  we  approach  this  theo- 
retical limit  it  is  seen  that  the  concept  becomes  less  and  less  practically  or 
even  theoretically  interesting.  Both  practically  and  theoretically  we  are 
interested  in  the  incomes  of  persons  who,  though  they  be  minors,  have 
"economically  come  of  age"  and  have  entered  into  certain  definite  rela- 
tions to  the  machinery  of  factorial  distribution.  They  are  "breadwinners" 
or  "persons  gainfully  employed,"  and  the  concept  back  of  such  expres- 
sions, though  like  many  economic  concepts  somewhat  of  a  compromise, 
seems  a  good  compromise  for  our  purposes. 

Defining  income  recipient  as  we  have,  we  cannot  use  the  Australian 
material  as  an  aid  to  the  graduation  or  adjustment  of  the  American  income- 
distribution  curve  in  its  lower  ranges.  In  the  upper  income  ranges,  the 
Australian  distribution  offers,  as  we  shall  see,  an  interesting  illustration 
of  the  same  double  swing  (letter  S)  appearance  of  the  curve  seen  in  some 
of  the  more  recent  American  data.1 

1  When  charted  on  a  double  log  scale. 


CHAPTER  30 
AMERICAN  INCOME  TAX  RETURNS 

At  the  beginning  of  the  preceding  chapter  attention  was  drawn  to  some 
reasons  why  income-tax  returns  cannot  take  the  place  of  an  adequate 
income  census.  Nevertheless  tax  returns  are  in  many  respects  the  most 
important  single  source  of  information  we  have  for  estimating  the  fre- 
quency distribution  of  incomes.  Were  there  neither  tax  returns  nor  in- 
come censuses  for  any  country,  it  is  difficult  to  see  how  we  could  make 
even  an  interesting  guess  as  to  the  distribution  of  income  in  the  upper 
ranges. 

American  income-tax  data  go  back  to  1913.  We  have  now  at  our  dis- 
posal returns  for  the  seven  years,  1913  to  1919,  inclusive.1  However,  the 
amount  of  information  given  in  the  official  reports  for  the  earlier  years 
1913,  1914  and  1915  is  not  great.  Little  is  shown  beyond  the  number 
of  returns  classified  by  large  income  intervals  and  the  same  returns  classi- 
fied by  districts.  The  1916  tax  report  is  the  most  voluminous  and  in  one 
respect  the  most  adequate  report  which  has  yet  appeared.2  It  contains 
a  set  of  tables  which  we  are  sorry  to  miss  in  the  later  reports,  showing 
the  frequency  distribution  of  incomes  by  separate  occupations.  Other 
features  of  this  report  which  have  been  retained  in  later  years  are  tables 
showing  both  number  of  returns  and  amount  of  net  income  for  each  income 
class  for  the  country  as  a  whole,  and  the  same  by  States;  tables  showing 
the  sources  of  the  income  returned  in  each  income  interval,  that  is  the 
amount  from  wages,  business,  property;  distribution  tables  arranged  by 
sex  and  conjugal  condition;  amounts  of  tax  collected  from  each  income 
class,  etc. 

Changes  in  the  Federal  Income  Tax  Law  during  the  period  have  not 
been  such  as  greatly  to  affect  any  conclusions  which  we  have  drawn  from 
the  data.  From  the  standpoint  of  this  investigation,  probably  the  most 
important  changes  in  the  law  relate  to  general  deductions,  professions,  and 
minimum  taxable  income. 

In  the  1916  returns  all  deductions  were  classified  as  general  deductions. 

1  The  Annual  Reports  of  the  Commissioner  of  Internal  Revenue  are  the  sources  for  American 
income-tax  data  for  the  years  1913  to  1915.  Since  1915  the  data  have  appeared  annually 
as  a  separate  Treasury  Department  publication  entitled  Statistics  of  Income. 

2  A  peculiarity  of  the  1916  data  is  that  the  returns  are  tabulated  as  family  rather  than  in- 
dividual returns.  "The  net  incomes  reported  on  separate  returns  made  by  husband  and  wife 
in  1916  are  combined  and  included  as  one  return  in  the  figures  for  the  several  classes."  Statis- 
tics of  Income,  1917,  p.  22. 
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In  the  1917  returns  the  types  of  deductions  classified  as  general  deductions 
were  greatly  reduced;  not  even  contributions  were  included.  In  1918  the 
category  was  enlarged;  contributions,  for  example,  were  again  placed  in 
the  general  deductions  class.  Now  these  changes  affect  greatly  the  rela- 
tions between  net  and  total  income  from  year  to  year.  Reported  net  income 
was  in  1916  only  75.43  per  cent  of  reported  total  income,  in  1917  it  was 
92.67  per  cent,  in  1918  89.74  per  cent,  and  in  1919  88.51  per  cent.  As 
it  is  the  total  and  not  the  net  income  which  in  the  Statistics  of  Income,  is 
divided  up  according  to  source,  such  fluctuations  as  the  above  interfere 
with  comparisons  of  different  years. 

While  income  from  professions  was  tabulated  separately  in  1916,  in  1917 
it  was  included  in  wages  and  salaries,  and  in  1918  and  1919  in  business. 

In  the  1913  to  1916  returns  exemptions  were  $3,000  per  annum  for  an 
unmarried  person,  or  a  married  person  not  living  with  his  wife  (or  her 
husband),  and  $4,000  per  annum  aggregate  exemption  for  married  persons 
living  together.1  In  the  1917  and  later  returns  these  minima  were  reduced 
to  $1,000  and  $2,000  respectively.  However,  the  increase  in  usefulness  for 
our  purposes  of  the  1917  and  later  returns  was  even  greater  than  the 
lowered  minima  would  suggest.  Not  only  was  the  minimum  taxable 
income  lowered  from  $3,000  to  $1,000,  but  this  reduction  occurred  in  the 
face  of  a  rapidly  rising  general  level  of  incomes.  With  the  rise  in  incomes, 
$3,000  in  1918  or  1919  was  relatively  a  much  smaller  income  than  $3,000 
in  1913.  In  other  words,  we  might  logically  expect  $3,000  to  be  relatively 
further  down  the  income  distribution  curve  in  1918  than  in  1916  or 
1917. 

The  accuracy  of  the  reporting  is,  of  course,  a  matter  of  great  importance 
for  this  investigation.  Now,  while  it  does  not  seem  possible  to  measure 
directly  from  the  data  changes  in  accuracy  of  reporting  during  the  period, 
the  rapid  expansion  of  the  income-tax  organization  and  its  increasing 
attention  to  the  investigation  and  checking  of  returns  establish  the  pre- 
sumption of  greater  statistical  value  in  the  reports  for  the  later  years. 
Offsetting  this  to  an  unknown  degree  is  the  apparently  increasing  amount 
of  "legal  evasion"  in  the  higher  income  classes.  The  reporting  for  the 
years  1913,  1914,  1915  and  1916  appears  to  have  been  peculiarly  bad  in 
the  lower  income  ranges.  The  distinct  improvement  in  1917  (compare 
the  1917  returns  with  those  for  earlier  years  in  Tables  28B,  28C,  28D,  28E, 
and  Charts  27  and  28  of  Volume  I)  seems  associated  with  the  patriotic 
enthusiasm  engendered  by  the  war.  Upon  our  entry  into  the  war,  not 
only  did  the  Bureau  of  Internal  Revenue  make  an  increased  effort  to  ob- 

1  As  the  returns  for  1913  were  for  income  received  for  the  ten  months  March  1  to  December 
31,  1913,  the  actual  minima  used  for  reporting  purposes  were  $2,500  and  $3,333.33  (i.  e.,  |8 
of  $3,000  and  $4,000  respectively). 
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tain  correct  returns  but  individuals,  under  the  spur  of  patriotism,  seem  to 
have  made  less  effort  to  evade.1 

The  remainder  of  this  chapter  is  concerned  largely  with  a  discussion 
of  possible  irregularities  in  the  distribution  of  non-reporting  and  under- 
statement in  the  later  years.  While  the  total  amount  of  non-reporting 
and  understatement  was  almost  certainly  greater  in  the  returns  for  1917 
than  in  those  for  1918  and  1919,  are  we  sure  that  the  non-reporting  and 
understatement  of  these  later  years  are  not  possibly  more  irregularly  dis- 
tributed along  the  frequency  curve  than  was  the  case  in  1917?  Is  it 
possible  that  the  improvement  in  the  accuracy  of  the  published  returns 
for  1918,  as  compared  with  those  for  1917,  was  so  much  greater  in  the 
income  intervals  under  $5,000  that  the  resulting  change  in  the  shape  of 
the  frequency  curve  may  amount  to  something  almost  akin  to  an  "over- 
adjustment"? 

Income  returns  by  individuals  are  made  on  two  types  of  blanks,  a  blank 
to  be  filled  in  by  persons  reporting  incomes  under  $5,000  and  another 
blank  to  be  filled  in  by  persons  reporting  incomes  over  that  figure.  Now, 
while  the  returns  of  incomes  under  $5,000  and  made  on  "under  $5,000" 
blanks  are  examined,  investigated  and  audited  in  the  field  soon  after 
their  receipt,  the  investigation  and  audit  of  the  returns  for  incomes  over 
$5,000  are  handled  in  Washington.  If  an  individual  has  an  actual  income 
of  $8,000  but  reports  $4,600  (on  an  "under  $5,000"  blank),  as  soon  as  a 
Field  Collector  discovers  this  discrepancy,  he  passes  the  matter  over  to 
the  Revenue  Agent  in  charge  of  the  District  for  Field  Investigation.  The 
return,  accompanied  by  the  Agent's  report,  is  forwarded  to  Washington 
for  final  audit.  Thus  the  Field  Collectors  audit  only  returns  that  are  (a) 
made  on  "under  $5,000"  blanks  and  (b)  believed,  after  investigation,  to  be 
for  incomes  which  are  actually  under  $5,000. 

While  the  Field  Audit  of  returns  of  these  incomes  is  well  under  way 
before  the  preparation  of  the  statistical  tables  in  the  Statistics  of  Income 
and  hence  appears  in  that  tabulation  to  an  unknown  extent,  the  Washing- 
ton audit  of  incomes  over  $5,000  has  hardly  begun  and  hence  the  amended 
figures  for  these  higher  incomes  do  not  appear  in  the  Statistics  of  Income. 
It  is  impossible  to  say  exactly  how  much  of  the  "bulge"  2  which  appears 
in  the  $1,000  to  $5,000  interval  on  the  double  log  charts  of  the  1918  and 
1919  tax  income  distributions  is  caused  by  a  difference  in  the  accuracy 
of  the  published  figures  for  returns  of  incomes  under  and  over  $5,000. 
However,  the  Treasury  Department  states  that  "the  Statistics  of  Income 

1  It  must  not,  of  course,  be  assumed  that  the  increase  in  the  number  of  returns  in  1917  is 
traceable  solely  to  increased  goodness  of  reporting. 

2  Described  in  Chapter  28.  At  many  points  in  the  following  discussion  the  reader  should 
refer  back  to  the  presentation  of  the  case  for  heterogeneity  in  the  income-tax  data  contained 
in  Chapter  28. 
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are  compiled  almost  entirely  from  unaudited  returns  whether  they  be  for 
'under  $5,000'  or  'over  $5,000.'"  It  seems  probable  therefore  that  the 
sudden  change  in  slope  of  the  1918  curve  (on  a  double  log  scale)  at  about 
$5,000  can  be  explained  only  partially  by  a  change  in  accuracy  of  the 
published  returns  at  that  point. 

Moreover,  a  considerable  amount  of  evidence,  some  of  which  has  already 
been  presented  in  Chapter  28,  suggests  that  the  "bulge"  on  the  income 
curves  for  the  later  years  corresponds  to  a  reality  on  the  actual  income 
curves.  While  it  may  be  somewhat  over-accented  in  the  published  figures 
for  1918  and  1919,  and  while  the  figures  for  1917  might  have  shown  more 
of  such  a  "bulge"  *  had  the  reporting  been  better,  we  must  not  assume 
that  the  published  figures  for  either  1917  or  1918  give  a  radically  incorrect 
picture  of  the  facts  merely  because  the  income  curves  for  the  two  years 
are  so  different.  The  dogma  of  the  similarity  of  the  income  curve  from 
year  to  year  has  little  evidence  to  support  it. 

It  is  by  no  means  certain  that  even  the  apparently  definite  and  sharp 
angles  on  the  curves  in  this  $4,000  to  $6,000  region  give  an  unreal  picture. 
While  it  is  true  that  we  find  the  same  angles  on  the  wages  and  salaries 
curve,  that  curve  itself  seems  heterogeneous.  An  income  distribution 
curve  composed  of  wage  and  salary  earners  (in  the  ordinary  sense  of  the 
terms)  may  well  cut  an  income  distribution  curve  composed  of  "salaried 
entrepreneurs,"  and  business  and  financial  experts  somewhere  in  the  lower 
income  ranges.  The  angle  on  the  composite  curve  may  give  a  decidedly 
accurate  picture  of  the  facts.2 

Let  us  see  what  light  the  data  throw  on  some  of  these  problems. 
Table  30A  showing  the  number  of  returns  for  the  lower  income  intervals 
in  1917,  1918,  and  1919  and  the  percentage  movements  from  year  to  year 
illustrates  the  great  increase  in  the  number  of  returns  in  the  under-$5,000 
intervals  between  1917  and  the  later  years. 

Chart  No.  28  of  Volume  I,  on  which  are  drawn  the  frequency  distributions 
for  each  year  from  1916  to  1919  on  a  double  log  scale,  shows  the  difference 
in  the  appearance  of  the  income  curves  for  the  three  years.  Examining 
that  chart  we  notice  that  the  1918  data-points,  which  in  the  upper  income 
ranges  run  nearly  as  smoothly  as  the  1917  points,  in  the  $4,000  to  $5,000 
interval  move  abruptly  upwards  and  from  there  on  into  the  lowest  income 
ranges  are  well  above  the  1917  points,  showing  on  the  chart  an  irregular, 
plateau-like  effect  in  these  lowest  income  ranges.  No  such  "plateau" 
is  apparent  on  the  1917  line.    The  year  1919  presents  in  that  chart  a 

1  While  the  1917  curve  runs  much  more  smoothly  in  the  $3,000  to  $6,000  range  than  either 
the  1918  or  1919  curves,  it  is  not  without  the  hint  of  a  bulge  beginning  at  about  $4,500.  See 
p.  412. 

*  In  constructing  the  complete  income  distribution  curve  for  1918,  published  in  Volume  I. 
the  influence  of  changes  in  the  accuracy  of  reporting  around  $5,000  income  was  probably 
overestimated. 
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TABLE  30A 


Number  of  returns 

Percentage  increases 

Income  intervals 

1917 

1918 

1919 

1918 
over 
1917 

1919 
over 
1918 

1919 
over 
1917 

$2,000-13,000 
3,000-  4,000 
4,000-  5,000 

838,707 
374,958 
185,805 

1,496,878 
610,095 
322,241 

1,569,741 
742,334 
438,154 

78.47 
62.71 
73.43 

4.87 
21.68 
35.97 

87.16 

97.98 

135.81 

5,000-  6,000 
6,000-  7,000 
7,000-  8,000 
8,000-  9,000 
9,000-10,000 

105,988 
64,010 
44,363 
31,769 
24,536 

126,554 
79,152 
51,381 
35,117 
27,152 

167,005 

109,674 

73,719 

50,486 

37,967 

19.40 
23.66 
15.82 
10.54 
10.66 

31.96 
38.56 
43.48 
43.77 
39.83 

57.57 
71.34 
66.17 
58.92 
54.74 

similar  appearance  to  1918  though  the  absence  of  small  intervals  in  the 
range  immediately  above  $5,000  disguises  the  characteristics  of  the  curve 
materially.1 

The  change  in  the  contour  of  the  lower  range  of  the  tax  income  frequency 
curve  from  1917  to  1918  and  1919,  is,  as  we  have  mentioned,  associated 
with  a  large  increase  in  the  relative  amount  of  income  from  wages  and 
salaries  in  the  lower  intervals.  Tables  30B  and  30C  are  interesting  in 
this  connection.2 

The  1916  figures  in  Table  30B  are  introduced  simply  because  they 
are  computable.3  However,  too  much  weight  must  not  be  attached  to 
them.  The  1916  returns  are  undoubtedly  extremely  inadequate.  The 
high  percentages  that  year  from  $3,000  income  (the  1916  minimum)  up 
to  about  $10,000  may  possibly  be  the  result  of  the  ease  with  which  salary 
returns  (as  opposed  to  wage,  business,  or  other  returns)  are  obtainable. 
The  $4,000  to  $5,000  interval  is  the  lowest  comparable  interval  for  the 
four  years.4    In  that  interval  the  numbers  of  returns  by  years  were: 

1916-  72,027 
1917-185,805 
1918-322,241 
1919-438,154 


iWhen  chart  28  was  drawn  for  Volume  I,  only  "preliminary"  large  interval  data  were 
available.    Final  small  interval  data  show  a  "bulge"  very  similar  to  that  seen  in  the"  1918  line. 

2  The  1917  official  wages  figures  include  income  from  professions.  The  1918  and  1919  wages 
figures  do  not.  This  makes  the  increase  in  the  percentages  in  1918  still  more  striking.  In- 
come from  professions  was  tabulated  separately  in  1916,  but  was  included  in  the  wages  figures 
for  that  year  in  order  that  1916  and  1917  might  be  comparable. 

3  No  data  are  available  from  which  corresponding  figures  for  1913,  1914  or  1915  might 
be  calculated. 

4  The  $3,000-$4,000  interval  did  not  in  1916,  include  married  persons  making  a  joint  return. 
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TABLE  30B 


PER  CENT  THAT  INCOxME  FROM  WAGES  AND  SALARIES  IN  EACH  NET 
INCOME  CLASS  WAS  OF  TOTAL  NET  INCOME  IN  THAT  CLASS 


Income  class 

1916 

1917 

1918 

1919 

$      1,000-$     2,000 

79.45 

83.49 

2,000-       3,000 

69.75 

74.53 

3,000-       4,000 

76.98 

55.21 

61.86 

2,000-       4,000 

46.32 

(64.42) 

(69.45) 

4,000-       5,000 

66.86 

36.30 

48.85 

52.48 

5,000-      10,000 

53.31 

36.16 

39.59 

43.24 

10,000-     20,000 

36.38 

32.94 

38.60 

38.11 

20,000-      40,000 

24.60 

26.82 

33.16 

33.38 

40,000-     60,000 

17.23 

22.74 

27.88 

27.57 

60,000-     80,000 

16.20 

19.67 

25.36 

24.01 

80,000-    100,000 

13.37 

18.51 

22.16 

22.70 

100,000-    150,000 

13.34 

15.75 

18.44 

18.75 

150,000-   200,000 

9.39 

12.65 

13.16 

15.42 

200,000-    250,000 

9.14 

12.30 

13.07 

13.62 

250.000-    300,000 

7.87 

9.36 

12.57 

11.92 

300,000-    500,000 

6.59 

10.17 

11.27 

10.18 

500,000-1,000,000 

5.21 

6.39 

5.42 

6.80 

1,000,000-1,500,000 

4.84 

2.83 

7.54 

1.60 

1,500,000-2,000,000 

3.23 

3.76 

2.21 

10  00 

2,000,000  and  over 

.51 

2.39 

.85 

4.02 

The  amounts  of  income  from  wages  and  salaries  and  from  other  net  income 
in  the  $4,000-$5,000  interval  were  year  by  year  in  millions  of  dollars: 


1916 

1917 

1918 

1919 

Wages  and  salaries  ° 

216 
107 

301 

528 

703 
736 

1,029 

Other  net  income 

931 

°  Income  from  professions  is  included  in  the  1916  and  1917  wages  and  salaries  figures. 

The  percentage  changes  in  these  items  from  one  year  to  the  next  were: 

1917  1918  1919 

1916  1917  1918 

Wages  and  salaries 139 . 3  233 . 7  146 . 4 

Other  Net  Income 493 . 0  139 . 4  126 . 6 

It  is  plain  that  the  great  increase  in  the  $4,000-^5,000  interval  1  in  1917 
was  in  income  from  other  sources  than  wages  and  salaries. 

Table  30C  shows  the  wage  and  salary  figures  compared  with  total  income 
instead  of  net  income  as  in  Table  30B.  It  was,  of  course,  necessary  to  re- 
tain the  net  income  intervals  as  the  data  are  not  classified  in  total  income 

1  As  may  be  seen  from  Tables  30B  and  30C,  the  increase  from  1916  to  1917  in  income  from 
other  sources  than  wages  and  salaries  was  greater  than  the  increase  in  income  from  wages 
and  salaries  not  only  in  the  $4,000-$5,000  interval  but  also  in  the  $5,000-$  10,000  interval. 
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intervals.  Though  the  relations  between  years  are  different  in  this  table 
from  what  they  are  in  the  net  income  table,1  the  distribution  of  the  per- 
centages in  each  individual  year  shows  much  the  same  characteristics  in 
both  tables. 

TABLE  30C 


FER  CENT  THAT  INCOME  FROM  WAGES  AND  SALARIES  IN  EACH  NET 
INCOME  CLASS  WAS  OF  TOTAL  INCOME  IN  THAT  CLASS 


Income  class 
(Net) 

1916 

1917 

1918 

1919 

$      1,000-      $2,000 

74.67 

77.25 

2,000-        3,000 

65.42 

69.14 

3,000-       4,000 

47.74 

51.14 

56.71 

2,000-        4,000 

41.82 

(60.15) 

(64.12) 

4,000-        5,000 

45.96 

33.60  . 

44.82 

47.12 

5,000-      10,000 

36.38 

33.87 

33.55 

36.60 

10,000-      20,000 

25.76 

30.89 

33.10 

32.70 

20,000-      40.000 

18.81 

25.20 

28.76 

28.36 

40,000-      60,000 

13.75 

21.23 

23.79 

23.39 

60,000-      80,000 

12.76 

18.56 

21.51 

20.33 

80,000-    100,000 

10.74 

17.61 

19.00 

19.25 

100,000-    150,000 

11.06 

15.05 

15.92 

15.40 

150,000-    200,000 

7.68 

12.01 

13.10 

12.41 

200,000-    250,000 

7.83 

11.75 

11.22 

11.26 

250,000-    300,000 

6.64 

8.71 

10.73 

9.80 

300,000-    500,000 

5.50 

9.59 

9.62 

8.19 

500,000-1,000,000 

4.35 

5.88 

4.37 

5.38 

1,000,000-1,500,000 

4.12 

2.62 

6.29 

1.34 

1,500,000  2,000.000 

2.82 

3.54 

1.81 

8.54 

2,000,000  anH.  over 

.47 

2.18 

.63 

.32 

The  percentages  in  Tables  30B  and  30C  show  each  year  a  sudden  increase 
(as  we  approach  the  lower  income  intervals)  somewhere  in  the  $4,000  to 
$5,000  or  the  $5,000  to  $10,000  interval.  At  exactly  what  point  each  year 
do  these  sudden  increases  seem  to  occur?  Charts  30D,  30E  and  30F  pre- 
sent the  material  in  a  slightly  different  form.  They  illustrate  the  relation- 
ship between  the  average  income  from  wages  and  salaries  in  each  net 
income  interval  and  the  average  total  income  in  the  same  net  income  in- 
terval for  the  years  1917,  1918  and  1919  on  a  double  log  scale.  The  1918 
and  1919  charts  immediately  suggest  the  improbability  of  being  able  to 
describe  the  data  by  a  single  simple  mathematical  expression.  To  the 
1918  data-points  have  been  applied  two  distinct  mathematical  curves, 
which  fit  the  data  remarkably  well  and  intersect  at  about  $6,700  total 
income.  The  curve  fitted  to  the  upper  income  ranges  is  a  parabola,  while 
that  fitted  to  the  lower  income  ranges  is  an  hyperbola,  one  of  whose  asymp- 
totes is  the  45°  line  which  divides  the  chart  into  a  "possible"  and  an  "im- 

1  Some  reasons  for  the  changes  in  relation  of  net  to  total  income  from  year  to  year  are 
mentioned  en  pages  401  and  402. 
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CHART  30F 
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possible"  area.    The  equations  of  the  two  (1918)  curves  on  a  double  log 
scale  are  (I)  y  +  3.92945  —  2.744  x  +  -22  x  2  =0  (parabola) 

(II)  y  2  —  3.981909  y  —  .867246  xy  +  3.981909  x  —  .132754  x  2 
—  .060262  =  0  (hyperbola) 
As  it  is  difficult  to  estimate  accurately  by  eye  the  goodness  of  fit  of  a  curve 
to  data  when  charted  on  a  log  scale,  Table  30E  is  introduced: 

TABLE  30E 

WAGES  AND  INCOME  IN  THE  1918  INCOME  TAX  RETURNS 


Average  income  from 

wages  and  salaries 

Percentages 

Net  income 

Average 
total  income 

that  data  are  of 

intervals  (1918) 

Data 

Mathematical 

mathematical 
curves 

curves 

$      1,000-$     2,000... 

$      1,566 

$  1,169 

$  1,178 

99.2 

2,000-       3,000... 

2,583 

1,690 

1,652 

102.3 

3,000-        4,000... 

3,710 

1,897 

1,955 

97.0 

4,000-       5,000... 

4,866 

2,181 

2,117 

103.0 

5,000-       6,000... 

6,388 

2,192 

2,216 

98.9 

6,000-       7,000... 

7,620 

2,537 

2,555 

99.3 

7,000-       8,000... 

8,952 

2,963 

3,012 

98.4 

8,000-       9,000 .  .  . 

10,148 

3,341 

3,407 

98.1 

9,000-      10,000... 

11,214 

3,747 

3,760 

99.7 

10,000-      11,000... 

12,207 

4,171 

4,078 

102.3 

11,000-      12,000... 

13,707 

4,555 

4,542 

100.3 

12,000-      13,000 .  .  . 

14,263 

4,806 

4,709 

102.1 

13,000-      14,000... 

15,922 

5,529 

5,204 

106.2 

14,000-      15,000... 

16,778 

5,801 

5,455 

106.3 

15.000-     20,000... 

20,167 

6,375 

6,400 

99.6 

20,000-     25,000 .  .  . 

25,859 

7,891 

7,860 

100.4 

25,000-      30,000 .  .  . 

31,704 

9,196- 

9,211 

99.8 

30,000-      40,000 .  .  . 

39,644 

10,711 

10,872 

98.5 

40,000-      50,000... 

52,319 

12,639 

13.192 

95.8 

50,000-      60,000 .  .  . 

64,327 

14,963 

15;066 

99.3 

60,000-     70,000... 

74,848 

16,576 

16,539 

100.2 

70,000-     80,000 . . . 

90,437 

18,764 

18,459 

101.7 

80,000-     90,000... 

98,379 

19,273 

19,351 

99.6 

90,000-    100,000 .  .  . 

111,515 

20,447 

20,682 

98.9 

100,000-    150,000... 

139,520 

22,212 

23,163 

95.9 

150,000-    200,000... 

211,959 

27,758 

27,829 

99.7 

200,000-    250,000... 

259,487 

29,107 

30,068 

96.8 

250,000-   300,000.  .  . 

317,578 

34,076 

32,226 

105.7 

300,000-   400,000 .  .  . 

409,756 

44,393 

34,786 

127.6 

400,000-   500,000... 

514,882 

38,967 

36,847 

105.8 

500,000-   750,000... 

765,905 

27,582 

39,765 

69.4 

750,000-1,000,000.  .. 

1,013,846 

61,183 

41,229 

148.4 

1,000,000-1,500,000... 

1,426,182 

89,710 

42,199 

212.6 

1,500,000-2,000,000.  .  . 

2,084,715 

37,118 

42,199 

88.0 

2,000,000-3,000,000.  .. 

3,263,673 

50,178 

40,729 

123.2 

3,000,000-4,000,000.  .  . 

4,515,732 

11,013 

38,753 

28.4 

The  data  of  table  30E  move  rather  erratically  in  the  intervals  above 
,000  per  annum  income.    This  is  natural  in  view  of  the  small  number 
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of  cases  in  these  upper  intervals.  There  were  only  627  returns  reporting 
net  incomes  of  over  $300,000  per  annum;  this  is  less  than  one  seventieth 
of  one  per  cent,  of  the  total  number  of  returns.  In  the  28  intervals  under 
$300,000  per  annum  14  of  the  percentages  show  the  data  within  one  and 
one  half  per  cent,  of  the  mathematical  values. 

These  mathematical  curves  have  not  been  introduced  as  being  in  any 
sense  the  "law"  of  the  data  but  merely  to  emphasize  how  smoothly  the 
data  curves  run  and  yet  how  unmistakable  a  sensation  they  give  us  of  two 
parts,  one  above  about  $6,700  total  income  and  one  below  that  figure.1 
It  would,  of  course,  be  quite  impossible  to  get  any  sort  of  approximation 
to  the  lower  range  data  by  producing  the  parabola  fitted  to  the  upper 
income  ranges.     How  impossible  may  be  seen  from  Table  30EE. 

TABLE  30EE 

WAGES  AND  INCOME  IN  THE  1918  INCOME  TAX  RETURNS 


Average 

total 

income 

Average  income  from  wages 
and  salaries 

Percentages  that 
data  are  of 

intervals  (1918) 

Data 

Hyper- 
bola 

Para- 
bola 

Hyper- 
bola 

Para- 
bola 

$4,000-$5,000 
3,000-  4,000 
2,000-  3,000 
1,000-  2,000 

$4,866 
3,710 
2,583 
1,566 

$2,181 
1,897 
1,690 
1,169 

$2,117 
1,955 
1,652 
1,178 

$1,574 

1,152 

745 

391 

103.0 
97.0 

102.3 
99.2 

138.6 
164.7 
226.8 
299.0 

The  1919  data  show  the  same  two-curve  appearance  as  the  1918  data. 
This  may  be  clearly  seen  from  chart  30F.2  The  intersection  of  the  two 
curves  would  be  at  about  $7,100  instead  of  $6,700  as  on  the  1918  chart. 
Is  there  any  sign  of  such  a  change  from  one  curve  to  another  on  the  1917 
data?  There  seems  to  be.  Chart  30D  shows  the  1917  data  with  a  parabola 
fitted  to  the  observations  above  the  first  interval.  This  curve  and  Table 
30D  give  us  a  strong  impression  that  the  first  interval  cannot  be  described 
by  any  simple  curve  which  describes  the  remainder  of  the  data.  The  same 
two-curve  characteristics  as  the  1918  and  1919  data  are  strongly  suggested. 

The  equation  of  the  1917  parabola  on  a  double  log  scale  is  y  +  1.8417  — 
1.8346  x  +  .124  x  2  =  0.  The  poorness  of  the  fit  to  the  first  interval  and 
the  comparative  goodness  of  the  fit  to  the  remainder  of  the  data  as  high  as 
$250,000  per  annum  may  be  seen  from  Table  30D.  If  the  data  were 
numerous  enough  to  permit  us  fitting  two  curves  they  would  probably 
intersect  at  about  $4,500. 

1  An  alteration  in  the  size  of  the  intervals  in  which  the  data  are  quoted  by  the  Income  Tax 
Bureau  would  of  course  change  the  data  curve  to  some  extent.  However,  taking  the  intervals 
as  they  come  and  fitting  the  curves  to  them  we  get  the  unmistakable  impression  of  great  regu- 
larity.   It  seemed  scarcely  worth  while  to  fit  the  curves  to  areas  rather  than  points. 

s  The  story  told  by  Chart  30F  is  so  plain  it  seemed  hardly  necessary  to  fit  another  set  of 
curves. 
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TABLE  30D 


WAGES  AND  INCOME  IN  THE  1917  INCOME  TAX  RETURNS 


Average 
total  income 

Average  income  from 
wages  and  salaries 

Percentages 
that  data  are  of 

intervals  (1917) 

Data 

Mathematical 
curve 

mathematical 
curve 

$      2,000-$      4,000... 

4,000-        5,000... 

5,000-      10,000... 

10,000-     20,000... 

20,000-      40,000... 

40,000-      60,000... 

60.000-      80,000... 

80,000-    100,000... 

100,000-    150,000... 

150,000-    200,000.  .. 

200.000-    250,000.  .. 

250,000-    300,000 .  .  . 

300,000-    500,000... 

500,000-1,000,000... 

1,000,000-1,500,000... 

1,500,000-2,000,000... 

2,000,000  and  over 

$     3,059 

4,818 

7,210 

14,623 

29,236 

51,940 

72,811 

93,742 

126,979 

181,156 

233,880 

293,905 

398,517 

740,769 

1,294,619 

1,812,388 

4,551,718 

$1,280 

1,619 

2,442 

4,517 

7,368 

11,024 

13,516 

16,510 

19,108 

21,758 

27,501 

25,587 

38,204 

43,558 

33,973 

64,201 

99,132 

$1,101 

1,688 

2,422 

4,374 

7,411 

11,038 

13,699 

15,992 

19,081 

23,147 

26,388 

29,478 

33,877 

43,632 

52,845 

58,358 

71,945 

116.3 

95.9 

100.8 

103.3 

99.4 

99.9 

98.7 

103.2 

100.1 

94.0 

104.2 

86.8 

112.8 

99.8 

64.3 

110.0 

137.8 

Both  the  regularity  of  the  data  curves  and  the  positions  of  the  inter- 
sections of  the  mathematical  curves  !  might  suggest  that  heterogeneity 
of  the  wages  and  salaries  data  was  the  primary  cause  of  the  irregularity 
in  the  total  income  curve.  The  position  of  the  points  of  intersection  of  the 
mathematical  curves  might  seem  inconsistent  with  a  sudden  change  in 
accuracy  of  reporting  at  exactly  $5,000. 

However  this  argument  does  not  appear  so  conclusive  when  we  examine 
the  actual  amount  of  wages  in  each  income  interval.  The  constitution  of 
the  reported  income  each  year  may  be  seen  rather  plainly  in  Charts  28T, 
28U,  28V,  28W,  28X,  28Y,  28Z,  and  28AA.2  These  charts  show  the  number 
of  dollars  per  dollar  income  interval  reported  in  each  income  interval  by 
sources  for  the  years  1916  to  1919.3  They  not  only  illustrate  the  fact  that 
the  constitution  of  the  income  curve  changes  radically  as  we  move  from 
small  to  large  incomes  but  also  picture  the  salient  characteristics  of  these 
changes;  each  source  curve,  being  charted  on  a  double  log  scale,  may  be 

1  Particularly  the  1919  intersection  which  is  above  the  $5,000  to  $6,000  net  income  interval. 

2  See  pages  385  to  392. 

3  The  five  lines  representing  wages,  business,  rents,  interest,  and  dividends  were  found  to 
interweave  to  such  an  extent  when  drawn  on  one  chart  that  two  charts  were  drawn  for  each 
year,  one  representing  wages  and  business  and  the  other  incomes  from  property. 

Wages  includes  "salaries,  wages  and  commissions"  and  in  1916  and  1917  "professions  and 
vocations." 

Business  includes  "business,"  "partnerships,  personal  service  corporations,  estates,  and 
trusts,"  and  "profits  from  sales  of  real  estate,  stocks,  bonds,  etc.,"  and  in  1918  and  1919 
"professions." 

Rents  includes  royalties. 

Interest  includes  unclassified  investment  income. 
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seen  at  a  glance  in  its  entirety.  We  see  from  Charts  28X  and  28Z  that, 
though  the  ratio  of  the  income  from  wages  and  salaries  to  total  income 
may,  when  charted,  show  an  angle  above  $5,000,  the  entire  "bulge"  on 
the  wages  and  salaries  curve  itself  occurs  in  the  under-$5,000  intervals 
both  in  1918  and  1919.  Moreover,  while  "wages  and  salaries"  is  the  larg- 
est item  in  these  lowest  income  intervals,  and  hence  is  the  controlling  factor 
in  determining  the  peculiar  shape  of  the  total  curve  in  this  region,  it  is  not 
the  only  item  showing  irregularities  and  "bulges."  Some  of  these  move- 
ments are  extremely  difficult  to  explain.  Why  should  a  "bulge"  appear 
on  the  lower  income  ranges  of  the  "rent"  curve  in  1918  and  by  1919  be- 
come pronounced?  l  The  appearance  of  a  bulge  on  the  wage  curves  in 
1918  and  1919  seems  quite  explicable  on  the  basis  of  heterogeneity  within 
the  wage  and  salary  data  themselves  but  one  feels  a  shade  less  confidence 
in  any  explanation  of  why  that  curve  moved  in  this  peculiar  manner  if  the 
explanation  does  not  seem  also  clearly  applicable  to  the  rents  curve  which 
moved  in  an  apparently  similar  manner. 

1  A  mere  increase  in  rents  will  not,  of  course,  account  for  this  unevenness  in  their  distribu- 
tion. 


CHAPTER  31 

INCOME  DISTRIBUTIONS  FROM  OTHER  SOURCES  THAN 
INCOME  TAX  RETURNS 

Concerning  the  frequency  distribution  of  incomes  over  $3,000  or  $4,000 
per  annum  we  have  almost  no  information  aside  from  the  income  tax 
returns.  Existing  wage  distributions  and  non-tax  income  distributions 
almost  never  reach  higher  than  $2,500  or  $3,000  per  annum. 

Even  in  the  lower  income  ranges  (under  say  $2,500  or  $3,000)  most  of 
the  existing  non-tax  income  distributions  are  of  little  use  in  our  problem. 
In  the  first  place  there  are  less  than  half  a  dozen  distributions  of  this  sort 
which  are  not  such  small  samples  as  to  prevent  us  feeling  much  confidence 
in  their  representative  nature.1  An  even  more  serious  defect  of  every  such 
distribution  known  to  us,  with  one  exception2  is  that  the  purpose  for  which 
the  data  have  been  collected  almost  inevitably  makes  them  extremely 
ill-adapted  to  our  use.  For  example,  one  of  the  largest  recent  samples  is 
prefaced  by  almost  a  page  of  introduction  explaining  what  types  of  re- 
cipients were  purposely  excluded.3  This  is  rather  typical.  To  base  upon 
such  distributions  any  wide  generalizations  with  respect  to  the  income 
curve  for  the  country  as  a  whole  or  even  for  the  localities  from  which  such 
data  were  collected  would  be  unwarranted. 

Furthermore,  almost  without  exception  these  studies  in  income  distri- 
bution are  on  a  family  basis.    While  it  is  sometimes  possible  to  make  a 

1  For  example,  Chapin's  well-known  investigation  into  the  distribution  of  incomes  includes 
only  391  workingmen's  families,  and  the  best  distribution  of  farmers'  incomes  includes  only 
401  farmers  from  a  single  state. 

2  Arthur  T.  Emery's  distribution  of  income  among  1960  Chicago  households. 

s"In  studying  the  sources  of  income  and  the  importance  of  each  source  with  relation  to 
the  total  income  of  a  family  the  following  limitations  to  the  type  of  family  schedules  should 
be  kept  in  mind.  No  families  were  scheduled  in  which  there  were  children  who  lived  as 
boarders,  that  is,  paid  a  certain  sum  per  week  or  per  month  for  board  and  spent  the  remainder 
of  their  earnings  or  salary  as  they  saw  fit.  No  families  were  scheduled  which  kept  any  board- 
ers. The  number  of  lodgers  to  be  kept  by  a  family  was  limited  to  three  at  any  one  time.  No 
families  were  scheduled  in  which  the  total  earnings  of  the  family  did  not  equal  75  per  cent,  or 
more  of  the  total  income.  It  will  be  seen  that  these  limitations  excluded  a  large  number  of 
families  and  this  materially  affects  the  percentage  of  families  having  earnings  from  children 
and  income  from  lodgers,  and  also  results  in  showing  a  larger  percentage  of  the  total  income 
as  coming  from  the  earnings  of  the  husband  than  would  be  the  case  if  the  type  of  families 
named  had  not  been  excluded  from  the  study.  It  also  reduces  the  actual  amount  per  family 
earned  by  children  and  received  from  boarders  or  lodgers  that  would  be  shown  in  case  a  cross 
section  of  a  community  including  all  the  types  mentioned  were  used.  The  object  in  making 
the  exclusions  named  was  to  secure  families  dependent  for  support,  as  largely  as  possible, 
upon  the  earnings  of  the  husband.  Of  course,  it  was  impracticable  to  secure  a  sufficient 
number  of  families  in  which  the  only  source  of  income  was  the  earnings  of  the  husband,  but 
in  following  the  course  named  the  percentage  of  families  having  an  income  from  other  sources 
has  been  very  largely  reduced."  "Cost  of  Living  in  the  United  States — Family  Incomes," 
Monthly  Labor  Review,  Dec,  1919,  p.  30. 
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rough  estimate  of  the  individual  incomes  from  the  family  data,  such  es- 
timates are  not  what  are  needed  for  our  purposes.  They  can  show  nothing 
but  the  distribution  of  income  among  the  individuals  constituting  these 
families  and  these  families  are  almost  inevitably  so  chosen  as  to  make  the 
individuals  composing  them  not  representative  of  income  recipients  at 
large.  Analysis  of  the  distribution  of  earnings  among  the  individual  mem- 
bers of  such  families  discloses  an  heterogeneity  so  extreme  as  to  result  in  a 
pronouncedly  duomodal  distribution  curve.  The  fathers'  incomes  have  one 
mode  while  the  children's  incomes  have  another.  Chart  31A  showing  a 
natural  scale  frequency  distribution  of  earnings  among  2811  individuals 
in  2170  families  in  1918  *  exhibits  this  duomodal  appearance  in  a  striking 
manner.    The  " families"  had  been  so  chosen  as  to  exclude  both  young 
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married  couples  having  no  children  and  unmarried  but  independent  wage 
earners.  Investigations  planned  to  bring  out  the  economic  character- 
istics of  such  "typical  families,"  while  they  may  be  extremely  valuable 
for  the  purposes  for  which  they  were  undertaken,  are  necessarily  of  but 
little  use  in  the  construction  of  a  frequency  distribution  of  all  individual 
incomes  in  the  community.  Moreover,  even  if  we  were  attempting  to 
construct  a  family  and  not  an  individual  distribution  these  data  would  not 
generally  be  particularly  helpful  for,  in  addition  to  the  exclusions  just 
mentioned,  further  narrow  and  rigid  restrictions  are  usually,  and  for  the 
purposes  in  view  quite  properly,  imposed  upon  the  definition  of  the  "  typical 
family." 

1  This  is  a  sample  from  the  12,096  white  families  referred  to  in  note  3,  page  415  The 
detailed  figures  of  this  sample  were  tabulated  for  us  by  the  Bureau  of  Labor  Statistics. 
They  cover  15  cities  chosen  as  representative  of  the  whole  list.  Each  one  of  the  15  cities 
shows  the  duomodal  appearance  referred  to  in  the  text. 
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As  incidentally  remarked  above,  there  is  one  non-tax  income  frequency 
distribution  to  which  many  of  the  above  criticisms  do  not  apply.  It  is 
the  distribution  of  income  among  1960  Chicago  "households"  in  1918  from 
an  investigation  made  by  Mr.  Arthur  T.  Emery  for  the  Chicago  Daily 
News.1  Instead  of  attempting  to  describe  a  "typical  family"  Mr.  Emery 
attempted  to  discover  the  "household"  income  of  each  person  whose  name 
came  at  the  top  of  a  page  in  the  Chicago  city  directory.  Mr.  Emery  en- 
countered many  difficulties  in  attempting  to  follow  out  this  scheme  and 
has  himself  pointed  out  sources  of  error.2  Notwithstanding  the  inevitable 
difficulties,  Mr.  Emery  seems  to  have  made  a  real  effort  to  obtain  a  scien- 
tific sample.  While  his  distribution  shows  unmistakable  irregularities, 
it  is  in  many  respects  for  our  purposes  the  most  interesting  and  suggestive 
recent  non-tax  income  distribution  available. 

Finally,  it  seems  impossible  to  obtain  from  these  distributions  any  but 
extremely  general  conclusions  concerning  the  relation  between  income 
from  effort  and  income  from  property.  The  data  have  almost  always  3 
been  so  chosen  as  to  eliminate  any  families  obtaining  an  appreciable  frac- 
tion of  their  income  from  property.  While  they  may  give  us  some  clues 
as  to  the  shape  of  the  upper  range  tail  of  the  wage-earners'  income  distri- 
bution curve  4  they  can  tell  us  little  about  even  the  upper  tail  of  the  general 
income  curve  and  almost  nothing  about  the  lower  income  tail  of  either  the 
wage-earners'  or  the  general  income  curve. 

1  While  the  Bureau  is  not  at  liberty  to  publish  this  material  we  were  permitted  to  make 
what  use  we  could  of  it  in  constructing  our  income  curve  for  the  country. 

2  In  a  letter  to  the  Bureau  he  writes,  "There  was,  however,  one  important  source  of  error 
in  this  method — the  poorer  and  middle  class  residents  were  willing  to  talk,  and  with  the  care- 
fully trained  approach  of  the  investigator,  the  upper  class  was  also  won  over,  but  we  found 
in  the  wealthy  districts  that  the  butler  and  'not  at  home'  caused  a  large  amount  of  travel 
on  the  part  of  the  investigator,"  and  often  a  final  failure  to  obtain  any  report. 

3  These  remarks  do  not  apply  to  the  distribution  of  income  among  the  401  farmers  or  Mr. 
Emery's  distribution.  However,  the  Bureau  has  no  figures,  in  the  case  of  Mr.  Emery's  dis- 
tribution, for  income  from  property. 

*  Compare  pages  378,  379,  380. 


CHAPTER  32 
WAGE  DISTRIBUTIONS 

There  is  in  all  an  immense  amount  of  American  wage  data.  On  the  other 
hand,  as  an  investigator  gets  into  his  subject,  he  begins  to  realize  that  the 
material  is  more  remarkable  for  its  fragmentary  nature  than  for  its  amount 
— great  as  that  may  be.  For  no  recent  year  can  he  obtain  wage  distribu- 
tions for  more  than  about  8  per  cent,  of  those  gainfully  employed.  Of 
course,  if  these  8  per  cent,  were  scattered  over  the  different  types  of  em- 
ployment and  localities  in  any  truly  random  fashion,  and  if  their  wages 
were  uniformly  reported,  much  might  be  done  with  the  material.  As 
things  are,  however,  whole  occupations  as  important  as  agricultural  labor 
and  trade  are  almost  unrepresented.  Moreover,  as  we  are  interested  in 
the  amount  of  wages  actually  received  during  the  year,  it  is  rather  dis- 
couraging to  find  that  this  is  the  one  type  of  distribution  which  practically 
never  occurs.  Distributions  of  amounts  actually  earned  in  a  month  are 
almost  as  rare.  There  are  a  few  distributions  of  amounts  actually  earned 
in  a  week  or  fortnight,  but  the  great  majority  of  wage  distributions  are 
distributions  of  wage  rates — figures  by  the  hour  being  the  commonest — or 
of  hypothetical  earnings,  generally  known  as  full-time  earnings  per  week. 

Now  it  is  in  general  impossible  to  construct  a  wage  distribution  for  earn- 
ings from  a  distribution  of  rates.  Earnings  depend,  of  course,  not  only  on 
rates  but  also  on  hours  worked.  However,  we  seldom  know  anything  about 
the  distribution  of  hours  worked  and  almost  never  do  we  know  anything 
about  the  relation  between  rates  and  hours  worked.  Chart  32 A  illustrates 
how  violent  may  be  the  difference  in  shape  of  the  earnings  and  rates  curves 
for  the  same  individuals.1  The  earnings  distribution  in  this  particular 
case  shows  not  only  a  much  greater  scatter  than  the  rates  distribution  but 
is  of  an  entirely  different  shape,  as  may  be  seen  from  Chart  32B  where  the 
data  are  drawn  on  a  double  log  scale.  Chart  32C  shows  the  distribution 
of  hours  worked  in  a  week  for  the  same  individuals.  Now,  though  the 
slaughtering  and  meat  packing  industry  may  be  an  extreme  example, 
what  evidence  we  have  suggests  that  distributions  of  rates  and  of  earnings 
are  rarely  in  close  agreement.  Moreover  the  relation  of  the  one  distribu- 
tion to  the  other  changes  as  we  pass  from  industry  to  industry.2 

1  43,063  Male  Employees  in  the  Slaughtering  and  Meat  Packing  Industry  in  1917.  Bureau 
of  Labor  Statistics,  Bulletin  262.  For  purposes  of  comparison  the  two  distributions  are  so 
placed  that  the  frequency  curves  show  the  same  arithmetic  means  and  areas. 

1  Resulting  largely,  of  course,  from  the  varying  types  of  distributions  of  hours-worked-in- 
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CHART  32  A 

FREQUENCY  DISTRIBUTIONS  OF  RATES  OF 
WAGES  PER  HOUR  AND  EARNINGS  PER  WEEK 
FOR    43,063  MALE  EMPLOYEES  IN  THE 
SLAUGHTERING  AND  MEAT  PACKING  INDUSTRY 
IN  THE  U.S.  IN  1917. 

S0UKC£:   BuKCRU  OF IpBOR  STHTItTICS.  BOLLMTW  ZSZ 

Sc/rjusa  ///rre/x/rt. 


\. 1 


\ 


. 


-\~2 


CHART  32B 

FREQUENCY  DISTRIBUTIONS  OF  RATES  OF  WAGES  PER  HOUR 

AND  EARNINGS  PER  WEEK  FOR  43,063  MALE  EMPLOYEES 

IN  THE  SLAUGHTERING  AND  MEAT  PACKING  INDUSTRY  IN 

THE  U.S.  IN  1317. 

Scukcg:  Burehu  of  Ojsor  Sr/fTtsTics  Bulletin  zsz 

Scfltfs    LoonmrHM/c. 


420 


PERSONAL  DISTRIBUTION  OF  INCOME  IN  U.  S. 


FREQUENCY  DISTRIBUTION 

OF 

HOURS  WORKED    IN  A  WEEK 

FOR 
43,063  MALE  EMPLOYEES 
5,000  IM  THE 

SLAUGHTERING  AND  MEAT  PACKING  INDUSTRY 
IN  THE 
U.S.  IN   I  SIT 
SouRC&Bwtcfio  of  lhbo#  Statistics. 

B(Ji.L£TW,2SZ 

4,000    g  SCH/.CS  HfnVRHU 


3,000    9 

s 


-2,000    z 


HOURS  WORKED  IN  THE  WEEK 
50  60 


The  same  difficulty  as  we  find  in  any  attempt  to  estimate  the  distribu- 
tion of  earnings  per  week  from  the  distribution  of  rates  per  hour  seems 
inherent  in  any  attempt  to  estimate  the  distribution  of  earnings  in  a  year 
from  the  distribution  of  earnings  in  a  week.  The  unknown  distribution  of 
weeks  worked  in  the  year  must  seriously  affect  our  results.1 

Estimating  the  frequency  distribution  of  wages  earned  in  a  year  for  an 
industry  from  the  frequency  distribution  of  wages  earned  in  another  year 
in  the  same  industry,  if  we  had  such  data,  would  involve  us  in  a  similar 
difficulty.  Even  though  we  knew  the  total  number  of  individuals  gainfully 
employed  and  their  total  wage  bill  each  year  and  also  the  frequency  dis- 
tribution of  earnings  for  one  of  the  years,  estimating  the  frequency  dis- 
tribution for  the  other  year  would  be  hazardous.  While  some  rates  dis- 
tributions for  the  same  industry  in  the  same  locality  show  symptoms  of 
not  changing  in  shape  very  radically  from  year  to  year,2  this  does  not  seem 


the-week  (month  or  year)  in  different  industries.  Illustrations  of  lack  of  uniformity  in  the 
relation  between  rates  and  earnings  of  the  same  persons  for  the  same  period  but  in  different 
industries  were  worked  up  from  Professor  Davis  R.  Dewey's  Special  Report  on  Employees 
and  Wages  for  the  12th  Census. 

1  We  have  no  distributions  of  amounts  earned  in  a  week  and  in  a  year  for  the  same  industry, 
with  which  to  illustrate  this  point  directly. 

2  For  example,  the  distribution  curve  for  wages  per  week  among  Massachusetts  factory 
workers  shows  a  moderate  degree  of  similarity  of  shape  from  year  to  year. 

Professor  H.  L.  Moore  (Political  Science  Quarterly,  vol.  XXII,  pp.  61-73)  discussed  the 
fluctuation  from  1890  to  1900  in  the  variability  of  wage  rates  in  a  total  made  up  of  thirty 
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a  sufficient  reason  for  assuming  the  same  of  earnings  distributions.  The 
shape  of  the  distribution  representing  hours  or  days  worked  in  the  year 
may  be  expected  to  change  greatly  from  year  to  year  with  alternations  of 
prosperity  and  depression.1 

What  little  evidence  we  possess  suggests  that  wage  distributions  2  for 
individuals  of  the  same  sex  in  the  same  industry  at  the  same  date,  but  in 
different  localities,  though  generally  more  dissimilar  in  shape  than  distri- 
butions for  the  same  industry  in  the  same  place  but  at  different  dates, 
are  less  unlike  one  another  than  distributions  for  different  industries  though 
in  the  same  place  and  at  the  same  time.  The  variation  in  shape  of  such 
distributions  for  different  industries  is  often  extreme.3 

selected  manufacturing  industries.  These  distributions  (for  1890  and  1900)  illustrate  both 
the  similarity  and  the  difference  in  rates  distributions  between  the  two  years. 

1  For  example,  what  little  information  we  have  points  to  the  "scatter"  of  the  days-worked- 
in-a-year  distribution  being  much  greater  in  a  year  of  depression  than  in  a  year  of  prosperity. 

The  extreme  variations  in  shape  of  the  income  distributions  for  the  same  1240  individuals 
in  the  years  1914  to  1919  as  seen  in  the  Statistics  of  Income,  1919,  page  30,  are  interesting  in 
this  connection. 

2  Whether  earnings  or  rates. 

'  Examples  of  this  are  numerous.  Charts  32D  and  32E  show  the  distribution  of  wages 
per  week  among  Massachusetts  males  working  in  (a)  the  boot  and  shoe  industry  and  (b)  the 
paper  and  wood  pulp  industry.  For  purposes  of  comparison  the  two  distributions  are  so 
placed  on  the  natural  scale  chart  that  the  frequency  curves  show  the  same  arithmetic  means 
and  areas.  The  double  log  chart  is  based  directly  upon  the  natural  scale  chart.  It  was 
necessary  to  break  up  the  "over  $35"  interval  before  calculating  the  arithmetic  means. 
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In  conclusion,  the  order  of  importance  of  the  variables  as  affecting  the 
shape  of  the  distribution  curve  seems  to  be — industry,  place,  time. 

We  have  but  little  basis  for  estimating  total  income  from  earnings.  In 
the  preceding  chapter  on  Income  Distributions  from  other  Sources  than 
Income  Tax  Returns  attention  was  drawn  to  the  difficulty  of  arriving  at 
any  reliable  statement  of  relationship  between  earnings  and  income  from 
such  distributions  because  of  the  way  in  which  the  data  were  selected.  It 
is  even  less  possible  to  discover  the  nature  of  any  such  relationship  from  the 
income-tax  material.  Though  there  is  no  such  apparent  "selection"  in 
the  income-tax  data  as  in  the  case  of  non-tax  income  distributions,  the 
material  is  not  arranged  to  answer  our  particular  question. 

The  non-statistical  reader  examining  Charts  30D,  30E  and  30F,  on  which 
are  plotted  average  total  income  and  average  income  from  wages  in  each 
income  interval,  might  think  that  it  would  be  quite  simple  to  estimate  the 
probable  average  total  income  of  persons  having  any  specified  wage.  How- 
ever there  is  a  profound  statistical  fallacy  involved  in  the  use  of  this  ma- 
terial for  any  such  purpose.  As  given  in  the  official  tables,  income  is  the 
independent  variable,  wages  the  dependent.  This  condition  cannot  be 
reversed  without  retabulation  of  the  original  returns.  The  statistical 
student  recognizes  the  problem  as  one  involving  the  impossibility  of  de- 
riving one  regression  line  from  the  other  when  neither  the  nature  of  the 
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equation  representing  the  regression  line  !  nor  the  degree  of  relationship 
(correlation  in  the  broad,  non-linear  sense)  is  known.  Even  if  we  knew 
that  the  average  net  income  of  those  persons  reporting  in  1918  in  the  $5,000 
to  $6,000  net  income  class  was  $5,474  and  the  average  wage  obtained  by 
these  persons  was  $2,192,  we  would  be  quite  unwarranted  in  concluding 
that  the  average  income  of  persons  receiving  $2,192  per  annum  wages  was 
$5,474.  If  no  wage  earner  received  income  from  any  other  source  than 
wages  we  still  would  have  a  condition  where  the  average  income  in  the 
income  class  would  be  greater  than  the  average  wage.  Total  wages  would 
be  necessarily  less  than  total  income,  because  in  the  income  class  are  in- 
cluded not  only  wage  earners  but  capitalists  and  entrepreneurs.  But  both 
total  wages  and  total  income  are  divided  by  the  same  number  to  get  an 
average — namely  total  number  of  persons  in  that  income  class. 

This  suggests  a  technical  criticism  of  the  material  contained  in  the 
Statistics  of  Income.  All  data  concerning  the  relation  between  two  vari- 
ables are  always  there  published  in  such  a  manner  as  to  give  information 
concerning  only  one  of  the  regression  lines  and  no  information  whatever 
concerning  the  "  scatter."  If  such  data  were  published  in  the  form  of  "  cor- 
relation tables"  the  increase  in  usefulness  for  statistical  analysis  would 
be  very  great.  Such  " correlation  tables"  keep  closer  to  the  original  data 
than  the  usual  type  of  statistical  tables.  Freer  use  of  them  is  much  to  be 
desired,  particularly  in  cases  where  it  is  difficult  to  anticipate  all  the  prob- 
lems for  whose  solutions  investigators  will  go  to  the  tabulated  materials. 

1  The  difficulty  of  the  problem  is,  if  possible,  increased  in  this  particular  case  because  of 
the  fact  that  the  regression  is  radically  non-linear. 


CHAPTER  33 

THE  CONSTRUCTION  OF  A  FREQUENCY  CURVE  FOR  ALL 
INCOME  RECIPIENTS 

The  direct  and  only  adequate  method  of  discovering  what  is  the  fre- 
quency distribution  of  income  in  the  United  States  would  be  to  define 
very  carefully  the  terms  income  and  income  recipient  and  then  have  a  care- 
fully planned  census  taken  by  expert  enumerators  upon  the  basis  of  these 
definitions.  The  returns  brought  in  by  the  enumerators  should  moreover 
be  sworn  to  by  the  persons  making  them  and  heavy  penalties  attached  to 
the  making  of  false  or  inaccurate  returns.  A  less  satisfactory  method  but 
one  which  would  probably  give  excellent  results  would  be  to  have  a  large 
number  of  truly  random  samples  taken  by  such  a  census.  The  results  of 
either  procedure  could  then  be  adjusted  in  the  light  of  other  statistical 
information  concerning  the  National  Income  and  also  in  the  light  of  theo- 
retical conclusions  derived  from  the  data  themselves. 

Constructing  an  income  frequency  distribution  for  all  income  recipients 
in  the  United  States  from  the  existing  data,  a  few  of  whose  peculiarities 
have  been  noted  in  the  preceding  chapters,  necessarily  involves  an  ex- 
tremely large  amount  of  pure  guessing.  It  is  only  because  of  the  practical 
value  of  even  the  roughest  kind  of  an  estimate  that  any  statistician  would 
think  of  attacking  the  problem.  The  method  followed  in  the  actual  con- 
struction of  the  income  frequency  distribution  has  been  outlined  in  vol- 
ume I.1  This  method  contains  one  assumption  after  another  that  is  open 
to  question.  Moreover  we  feel  in  many  cases  quite  unable  to  estimate 
the  probable  errors  involved  in  these  assumptions.  Their  only  excuse  is 
their  necessity.  What  is  the  amount  of  under-reporting  for  income  tax 
and  how  is  it  distributed?  What  is  the  effect  upon  the  returns  of  "legal 
evasion? "  To  what  extent  is  the  "bulge"  on  the  income-tax  returns  in  the 
region  under  about  $5,000  in  1918  the  result  of  the  "intensive  drive?" 
What  is  the  relation  between  wages  and  total  income  by  wage  intervals? 
What  is  the  relation  between  wage  rates  and  earnings  in  any  particular 
industry?  Etc.,  etc.  These  are  all  questions  which  must  be  answered  over 
and  over  again  and  yet  they  are  questions  the  answers  to  which  must  be, 
in  many  instances,  almost  pure  guesses.  And,  to  repeat,  the  margin  of 
possible  error  is  often  large. 

In  view  of  the  sparsity  and  inadequacy  of  the  data,  our  first  approach 
to  the  problem  was  an  attempt  to  discover,  if  possible,  some  general  mathe- 
matical law  for  the  distribution  of  income.    Were  we  to  get  any  very  defin- 

1  Income  in  the  United  States,  Vol.  I,  pp.  122-139. 
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ite  and  reliable  clues  as  to  the  mathematical  nature  of  the  frequency  dis- 
tribution of  income  from  small  sample  income  distributions  and  from  wages 
distributions,  etc.,  such  clues  might  of  course  be  invaluable  in  checking 
the  results  obtained  from  piecing  together  existing  wage  distributions, 
income  distributions,  and  other  scattered  information.  We  would  be  in 
the  position  of  the  astronomer  who  is  able  to  "adjust"  the  results  of  his 
observations  in  the  light  of  some  known  mathematical  law.  It  soon  be- 
came clear,  however,  that  it  is  quite  impossible  to  discover  any  essential 
peculiarities  of  the  income  frequency  distribution.  The  available  material 
is  not  only  insufficient  for  purposes  of  such  generalizations,  but  moreover 
the  distribution  from  year  to  year  is  so  dissimilar,  that  any  generalization 
of  this  nature  is  too  vague  to  be  of  any  practical  value. 

The  method  finally  used  for  the  construction  of  the  income  curve  has 
therefore,  we  are  sorry  to  say,  practically  all  the  weaknesses  of  the  data 
from  which  it  has  been  constructed.  The  occupations  of  the  country 
were  tabulated  and  to  each  occupation  was  assigned  those  wage  and  income 
distributions  which  seemed  applicable  with  the  least  strain.  We  had  then 
a  series  of  income  and  wage  distributions  which  nominally  covered  nearly 
all  the  income  recipients  in  the  United  States,  though  for  some  occupations 
the  inadequacy  of  the  wage  and  income  samples  was  little  short  of  absurd. 
The  wage  distributions  were  converted  into  income  distributions  on  the 
assumption  that  the  smaller  the  wage  the  larger  is  its  percentage  of  total 
income.  Beyond  this  simple  assumption  the  particular  functional  relation- 
ships used  for  many  industries  were  almost  pure  guess  work.  Moreover, 
not  only  was  there  the  danger  of  error  in  moving  from  wage  distribution 
to  income  distribution  and  the  danger  of  error  resulting  from  estimating 
a  wage  distribution  for  a  particular  industry  in  a  particular  locality  from 
a  similar  though  not  identical  industry  in  a  different  locality,  but  also  there 
was  the  danger  of  error  resulting  from  estimating  a  wage  distribution  for 
one  year  from  a  wage  distribution  for  another. 

The  final  results  are  probably  not  quite  so  bad  as  they  might  have  been 
had  we  not  had  a  number  of  collateral  estimates  with  which  roughly  to 
check  up  and  otherwise  adjust  the  first  results  of  our  estimates.  For  ex- 
ample, such  independent  information  as  Mr.  King's  estimate  of  the  total 
income  of  the  country  and  Mr.  Knauth's  estimate  of  the  total  amount  of 
income  from  dividends  were  pieces  of  information  with  which  the  results 
of  the  frequency  curve  calculations  were  made  to  agree. 

Some  hypothetical  reasoning  is  inevitable  in  such  a  statistical  study  as 
the  present  one.  The  investigator  must  not  lose  heart.  Sir  Thomas 
Browne  in  his  rolling  periods  sagely  remarks  that  "what  song  the  Syrens 
sang,  or  what  name  Achilles  assumed  when  he  hid  himself  among  women, 
though  puzzling  questions,  are  not  beyond  all  conjecture!" 
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